Jekyll2021-05-09T15:39:00-05:00https://nathancooper.io/i-am-a-nerd/feed.xmlIAmANerdA singular place for all of my nerdy ramblings.DeepSpeed Investigation: What I Learned2021-05-03T00:00:00-05:002021-05-03T00:00:00-05:00https://nathancooper.io/i-am-a-nerd/deepspeed/deep-learning/2021/05/03/DeepSpeed-Investigation<p>Deep learning is awesome, but the large compute and data requirements can prevent a lot of amazing people from using the models and contributing to the field. So, when I read about the amazing <a href="https://www.deepspeed.ai/">DeepSpeed</a> library allowing people with just a single GPU (like myself) to train massive models that would normally require multiple GPUs to just fit in memory, I had to investigate further!</p>
<h2 id="what-is-deepspeed">What is DeepSpeed?</h2>
<p>Here is a brief blurb from the DeepSpeed website on what it is and what it can do:</p>
<p>“</p>
<p>DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.</p>
<p><strong><em>10x Larger Models</em></strong></p>
<p><strong><em>10x Faster Training</em></strong></p>
<p><strong><em>Minimal Code Change</em></strong></p>
<p>DeepSpeed delivers extreme-scale model training for everyone, from data scientists training on massive supercomputers to those training on low-end clusters or even on a single GPU</p>
<p>“</p>
<p>Some impressive statements, but are they true? Kind of. Let’s dig a bit deeper into how this works.</p>
<p><img src="https://www.microsoft.com/en-us/research/uploads/prod/2020/05/1400x788DeepSpeedslowed.gif" alt="Overview of the large improvement ZeRO-2 and the DeepSpeed library has over ZeRO-2 and previous approaches." />
<em>From https://www.microsoft.com/en-us/research/blog/zero-2-deepspeed-shattering-barriers-of-deep-learning-speed-scale/</em></p>
<p>DeepSpeed is a library that enables the awesome <a href="https://arxiv.org/abs/1910.02054">Zero Redundancy Optimizer (ZeRO)</a>, which is a highly optimized optimizer (oh how clever) that improves memory management and communication in data or model parallelized work loads by removing redundancy. Now, this might bring up the question “parallelized work loads, I thought we could use this on a single GPU, what’s the deal?” So, the deal is that ZeRO was made to solve the problem of communication between multiple devices by doing some nifty memory tricks that are beyond the scope of this blog post (and my understanding. See <a href="https://youtu.be/tC01FRB0M7w">here</a> for a full explanation of this.). It just so happens that the ZeRO optimizer also performs CPU offloading, which moves some of the computation off your GPU and onto your CPU. With things being computed on your CPU, some of the model is stored in RAM rather than the GPUs VRAM. This significantly slows computation since CPUs and RAM wasn’t built with this in mind, but it means you are allowed to train bigger models and train with bigger batch sizes 🤓.</p>
<h2 id="putting-deepspeed-to-the-test">Putting DeepSpeed to the Test!</h2>
<p>To test out DeepSpeed, I used the awesome HuggingFace transformers library, which supports using DeepSpeed on their non-stable branch (though support is coming to the stable branch in 4.6 🤓). I followed these awesome <a href="https://huggingface.co/transformers/master/main_classes/trainer.html#deepspeed">instructions</a> on the HuggingFace’s website for getting started with DeepSpeed and HuggingFace. If you want to follow along at home, I created a Github <a href="https://github.com/ncoop57/deepspeed_testing">repository</a> with the Dockerfile (I’m addicted to docker and will probably make a blog post on docker too :)) and the test script I used to run my experiments on. I tried training the different versions of the awesome <a href="https://arxiv.org/abs/1910.10683">T5 model</a> that ranged from smallish ~60 million parameters to humungous 3 billion parameters. And here are my results:</p>
<p><img src="/i-am-a-nerd/images/deepspeed_chart.png" alt="Bar chart showing DeepSpeed increases time to train, but allows training larger models compared to not using DeepSpeed." />
<em>This was run on a machine with Ubuntu 20.04, 32GBs of RAM, Ryzen 5600x, and NVIDIA RTX 3080 GPU.</em></p>
<p>This is a chart of the different models’ training time in seconds with and without DeepSpeed and the biggest batch size I could fit for each. As you can see, using DeepSpeed increases training time, except for t5-small where the time is nearly identical. However, you’ll notice for t5-base (~220 million parameters) and t5-large (~770 million parameters) I am able to use a larger batch size, this is most noticable in the t5-large model where I can double the batch size. This is the important part that DeepSpeed gives you, it allows you to use larger batch sizes even if it increases the training time. Having large batch sizes is critical for many deep learning models as it allows the model to see more examples when doing updates thereby improving performance. This is the use case of using DeepSpeed, using big models with large batch sizes. If what you are doing doesn’t involve these two things, then you probably should skip DeepSpeed.</p>
<p><strong>Note:</strong> You’ll notice there is no bar for a 3 billion parameter model (t5-3b). This is because my PC cried out when I attempted to train the model even with DeepSpeed and a batch size of 1.</p>
<h2 id="conclusion-time">Conclusion Time</h2>
<p>So, with all things considered, DeepSpeed is an awesome library and ZeRO is an amazing optimizer. However, if you were looking for super speed boosts for a single GPU like I was, it ain’t it chief. ZeRO is designed for speeding up multi-GPU setups by efficiently handling memory resources and communication and in doing so does reduce the memory footprint on GPUs. It also does some awesome CPU offloading, which will allow you to train huge models with large batch sizes on a single GPU that you would not be able to normally. The part of larger batch sizes is super important for many deep learning models as it can improve their performance. So, my take away from this investigation is this: If you are using a multi-GPU setup, DeepSpeed is the way to go. However, for single GPU uses, only use it if you need a larger model andlarger batch sizes than what your normal GPU can handle.</p>
<p>Hope you’ve enjoyed this blog post and learned some information along the way. Comment down below with any questions you have, I’d be happy to help answer them!</p>
<p>Connect with me:</p>
<p>Website - <a href="https://nathancooper.io/#/">https://nathancooper.io/#/</a></p>
<p>YouTube - <a href="https://www.youtube.com/channel/UCKfOCnojK5YV7_hdPjAtY7Q">https://www.youtube.com/channel/UCKfOCnojK5YV7_hdPjAtY7Q</a></p>
<p>Github - <a href="https://github.com/ncoop57">https://github.com/ncoop57</a></p>
<p>Twitter - <a href="https://twitter.com/ncooper57">https://twitter.com/ncooper57</a></p>
<p>LinkedIn - <a href="https://www.linkedin.com/in/nathan-cooper-820292106/">https://www.linkedin.com/in/nathan-cooper-820292106/</a></p>Nathan CooperDeep learning is awesome, but the large compute and data requirements can prevent a lot of amazing people from using the models and contributing to the field. So, when I read about the amazing DeepSpeed library allowing people with just a single GPU (like myself) to train massive models that would normally require multiple GPUs to just fit in memory, I had to investigate further!Improved Code Summarization2020-12-26T00:00:00-06:002020-12-26T00:00:00-06:00https://nathancooper.io/i-am-a-nerd/code/summarization/deep-learning/seq2seq/2020/12/26/Improved_Code_Commenter<!--
#################################################
### THIS FILE WAS AUTOGENERATED! DO NOT EDIT! ###
#################################################
# file to edit: _notebooks/2020-12-26-Improved_Code_Commenter.ipynb
-->
<div class="container" id="notebook-container">
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h1 id="About">About<a class="anchor-link" href="#About"> </a></h1><p>Hi there, in this post you'll learn how to finetune the a RoBERT based model that's been trained on code data to automatically generate comments for code!</p>
<p>We will be focusing on the Java programming language, but you can apply the same techniques in this post for any programming language that interests you. Additionally, you'll see how to incorporate this code commenter into a <a href="https://code.visualstudio.com/">VSCode</a> extension so that you can generate comments for code snippets you highlight:</p>
<p>(Insert GIF of tool working)</p>
<p>As always, we'll start with a bit of background of the data and model we are using, but feel free to skip if you want to get straight to the awesomeness ;). Alright, let's GO!</p>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h1 id="Background">Background<a class="anchor-link" href="#Background"> </a></h1><h2 id="Data">Data<a class="anchor-link" href="#Data"> </a></h2><p>We will be using the awesome <a href="https://github.com/github/codesearchnet">CodeSearchNet</a> Challenge dataset, which contains millions of pairs of methods and their docstrings for a large variety of programming languages. The dataset was initially constructed for evaluating how well different approaches perform at searching for code. However, we can easily repurpose it for us and lucky for us, the awesome authors did an awesome job collecting, documenting, and cleaning the data.</p>
<p>We'll be performing a bit more cleaning and formatting of the data as well as adding some more examples. These examples won't be method/docstring pairs, but code snippet/inline comment pairs. This allows our model to generate comments for arbitrary code snippets that a developer may want to document instead of just generating the docstring of a method.</p>
<h2 id="CodeBERT">CodeBERT<a class="anchor-link" href="#CodeBERT"> </a></h2><p>The pretrained model we will be finetuning comes from the awesome paper from Microsoft's research division aptly named <a href="https://arxiv.org/abs/2002.08155">CodeBERT: A Pre-Trained Model for Programming and Natural Languages</a>. This model also used the CodeSearchNet challenge dataset, but instead of using it to generate comments it used to teach a RoBERTa based model to represent code and natural language in a useful way. This practice of eaching these large language models to represent text in a useful way is common practice now since these representations have been shown to be helpful in finetuning these models on other tasks. The CodeBERT paper showed these representations are helpful by finetuning them on the programming task of code search and comment generation, exactly what we will be doing! The difference between their comment generation task and ours is that we will do a bit more preprocessing and our model will be able to generate inline comments of code snippets and not just method level comments.</p>
<p>So, how does CodeBERT learn these representations? It combines two different training objectives that's been shown to be useful for natural language. The Masked Language Modeling objective (MLM), which is from the original <a href="https://arxiv.org/abs/1810.04805">BERT</a> paper, and Replaced Token Detection (RTD) objective, which is from the <a href="https://arxiv.org/abs/2003.10555">ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators</a> paper. The MLM objective is where we randomly mask out parts of the text that we feed into the model and ask the model to predict those masked out pieces. The RTD objective is where random tokens in the text are replaced and the model has to determine which of these tokens are replaced. However, to make it harder for the model, these replaced tokens attempt to be plausible alternatives and not just random words. The CodeBERT model actually used a n-gram based model to generate these alternatives where as the ELECTRA paper used a small BERT based model.</p>
<p><img src="https://nathancooper.io/i-am-a-nerd/images/electra.png" alt="ELECTRA Pretraining Objective" /> (From ELECTRA Paper)</p>
<p>Instead of using only natural language to apply these training objectives to, CodeBERT used code and docstrings. This allowed the CodeBERT model to learn a useful representation of code that could be used for other tasks.</p>
<p>Alright with that quick background knowledge down, lets get into actually finetuning our model!</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="o">!</span> nvidia-smi
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_subarea output_stream output_stdout output_text">
<pre>Thu Jan 14 20:43:12 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.27.04 Driver Version: 418.67 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |
| N/A 42C P8 11W / 70W | 0MiB / 15079MiB | 0% Default |
| | | ERR! |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h1 id="Data">Data<a class="anchor-link" href="#Data"> </a></h1>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>First we'll install the necessary packages and download our data!</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<details class="description">
<summary class="btn btn-sm" data-open="Hide Code" data-close="Show Code"></summary>
<p><div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="c1"># Download and install the necessary dependencies</span>
<span class="o">!</span> pip install -q <span class="nv">torch</span><span class="o">==</span><span class="m">1</span>.4.0 -f https://download.pytorch.org/whl/cu101/torch_stable.html
<span class="o">!</span> pip install -q <span class="nv">transformers</span><span class="o">==</span><span class="m">3</span>.5.0 fast-trees
<span class="o">!</span> git clone -q https://github.com/microsoft/CodeXGLUE.git
<span class="c1"># Download the CodeSearchNet Challenge dataset for the Java programming language</span>
<span class="o">!</span> wget -q https://s3.amazonaws.com/code-search-net/CodeSearchNet/v2/java.zip
<span class="o">!</span> unzip -qq java.zip
</pre></div>
</div>
</div>
</div>
</p>
</details>
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_subarea output_stream output_stdout output_text">
<pre> |████████████████████████████████| 753.4MB 21kB/s
<span class="ansi-red-fg">ERROR: torchvision 0.8.1+cu101 has requirement torch==1.7.0, but you'll have torch 1.4.0 which is incompatible.</span>
|████████████████████████████████| 1.3MB 12.6MB/s
|████████████████████████████████| 890kB 51.7MB/s
|████████████████████████████████| 2.9MB 50.3MB/s
|████████████████████████████████| 1.1MB 61.3MB/s
|████████████████████████████████| 112kB 64.8MB/s
|████████████████████████████████| 163kB 60.4MB/s
|████████████████████████████████| 71kB 11.8MB/s
Building wheel for sacremoses (setup.py) ... done
Building wheel for tree-sitter (setup.py) ... done
</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>Next let's read in our data and since these models take a long time to train, we will only select a subset of the data.</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<details class="description">
<summary class="btn btn-sm" data-open="Hide Code" data-close="Show Code"></summary>
<p><div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="nn">pd</span>
<span class="kn">from</span> <span class="nn">pathlib</span> <span class="kn">import</span> <span class="n">Path</span>
<span class="kn">from</span> <span class="nn">typing</span> <span class="kn">import</span> <span class="n">List</span><span class="p">,</span> <span class="n">Optional</span>
<span class="c1"># Code from CodeSearchNetChallenge: https://github.com/github/CodeSearchNet/blob/master/notebooks/ExploreData.ipynb</span>
<span class="k">def</span> <span class="nf">jsonl_list_to_dataframe</span><span class="p">(</span><span class="n">file_list</span><span class="p">,</span> <span class="n">columns</span><span class="o">=</span><span class="p">[</span><span class="s1">'code'</span><span class="p">,</span> <span class="s1">'docstring'</span><span class="p">]):</span>
<span class="sd">"""Load a list of jsonl.gz files into a pandas DataFrame."""</span>
<span class="k">return</span> <span class="n">pd</span><span class="o">.</span><span class="n">concat</span><span class="p">([</span><span class="n">pd</span><span class="o">.</span><span class="n">read_json</span><span class="p">(</span><span class="n">f</span><span class="p">,</span>
<span class="n">orient</span><span class="o">=</span><span class="s1">'records'</span><span class="p">,</span>
<span class="n">compression</span><span class="o">=</span><span class="s1">'gzip'</span><span class="p">,</span>
<span class="n">lines</span><span class="o">=</span><span class="kc">True</span><span class="p">)[</span><span class="n">columns</span><span class="p">]</span>
<span class="k">for</span> <span class="n">f</span> <span class="ow">in</span> <span class="n">file_list</span><span class="p">],</span> <span class="n">sort</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">get_dfs</span><span class="p">(</span><span class="n">path</span><span class="p">:</span> <span class="n">Path</span><span class="p">)</span> <span class="o">-></span> <span class="n">List</span><span class="p">[</span><span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">]:</span>
<span class="sd">"""Grabs the different data splits and converts them into dataframes"""</span>
<span class="n">dfs</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">split</span> <span class="ow">in</span> <span class="p">[</span><span class="s2">"train"</span><span class="p">,</span> <span class="s2">"valid"</span><span class="p">,</span> <span class="s2">"test"</span><span class="p">]:</span>
<span class="n">files</span> <span class="o">=</span> <span class="nb">sorted</span><span class="p">((</span><span class="n">path</span><span class="o">/</span><span class="n">split</span><span class="p">)</span><span class="o">.</span><span class="n">glob</span><span class="p">(</span><span class="s2">"**/*.gz"</span><span class="p">))</span>
<span class="n">df</span> <span class="o">=</span> <span class="n">jsonl_list_to_dataframe</span><span class="p">(</span><span class="n">files</span><span class="p">)</span><span class="o">.</span><span class="n">rename</span><span class="p">(</span><span class="n">columns</span> <span class="o">=</span> <span class="p">{</span><span class="s1">'code'</span><span class="p">:</span> <span class="s1">'mthd'</span><span class="p">,</span> <span class="s1">'docstring'</span><span class="p">:</span> <span class="s1">'cmt'</span><span class="p">})</span>
<span class="n">dfs</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">df</span><span class="p">)</span>
<span class="k">return</span> <span class="n">dfs</span>
<span class="n">path</span> <span class="o">=</span> <span class="n">Path</span><span class="p">(</span><span class="s1">'.'</span><span class="p">)</span>
<span class="n">df_trn</span><span class="p">,</span> <span class="n">df_val</span><span class="p">,</span> <span class="n">df_tst</span> <span class="o">=</span> <span class="n">get_dfs</span><span class="p">(</span><span class="n">path</span><span class="o">/</span><span class="s2">"java/final/jsonl"</span><span class="p">)</span>
<span class="n">sample</span> <span class="o">=</span> <span class="mf">0.01</span>
<span class="n">df_trn</span> <span class="o">=</span> <span class="n">df_trn</span><span class="o">.</span><span class="n">sample</span><span class="p">(</span><span class="n">frac</span> <span class="o">=</span> <span class="n">sample</span><span class="p">)</span>
<span class="n">df_val</span> <span class="o">=</span> <span class="n">df_val</span><span class="o">.</span><span class="n">sample</span><span class="p">(</span><span class="n">frac</span> <span class="o">=</span> <span class="n">sample</span><span class="p">)</span>
<span class="n">df_tst</span> <span class="o">=</span> <span class="n">df_tst</span><span class="o">.</span><span class="n">sample</span><span class="p">(</span><span class="n">frac</span> <span class="o">=</span> <span class="n">sample</span><span class="p">)</span>
<span class="nb">len</span><span class="p">(</span><span class="n">df_trn</span><span class="p">),</span> <span class="nb">len</span><span class="p">(</span><span class="n">df_val</span><span class="p">),</span> <span class="nb">len</span><span class="p">(</span><span class="n">df_tst</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
</p>
</details>
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_text output_subarea output_execute_result">
<pre>(4545, 153, 269)</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>Let's see how the data looks. As shown, we have the data in a good format with one column all of the methods (input into the model) and the other all of the comments (output of the model).</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">df_trn</span><span class="o">.</span><span class="n">head</span><span class="p">()</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_html rendered_html output_subarea output_execute_result">
<div>
<style scoped="">
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>mthd</th>
<th>cmt</th>
</tr>
</thead>
<tbody>
<tr>
<th>5360</th>
<td>@Override\n public GetLexiconResult getLexi...</td>
<td><p>\nReturns the content of the specified pron...</td>
</tr>
<tr>
<th>9365</th>
<td>public static void checkJavaInternalAccess(ILo...</td>
<td>Prints warning to given {@link ILogger} if Haz...</td>
</tr>
<tr>
<th>10145</th>
<td>private IAtom createAtom(Element element) {\n ...</td>
<td>Create a new atom for the provided symbol. The...</td>
</tr>
<tr>
<th>9008</th>
<td>public void marshall(Scte20PlusEmbeddedDestina...</td>
<td>Marshall the given parameter object.</td>
</tr>
<tr>
<th>24498</th>
<td>@Override\n public void prefetchToken(final F...</td>
<td>/*\nGets hadoop tokens for a user to run mapre...</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h2 id="Data-Cleaning">Data Cleaning<a class="anchor-link" href="#Data-Cleaning"> </a></h2>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>Now, that we have the data, let's clean it! First, we'll remove any non-ascii characters to simplify the problem so that the model only has to think about generating English comments.</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<details class="description">
<summary class="btn btn-sm" data-open="Hide Code" data-close="Show Code"></summary>
<p><div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="c1"># From https://stackoverflow.com/a/27084708/5768407</span>
<span class="k">def</span> <span class="nf">is_ascii</span><span class="p">(</span><span class="n">s</span><span class="p">):</span>
<span class="sd">'''</span>
<span class="sd"> Determines if the given string contains only ascii characters</span>
<span class="sd"> :param s: the string to check</span>
<span class="sd"> :returns: whether or not the given string contains only ascii characters</span>
<span class="sd"> '''</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">s</span><span class="o">.</span><span class="n">encode</span><span class="p">(</span><span class="n">encoding</span><span class="o">=</span><span class="s1">'utf-8'</span><span class="p">)</span><span class="o">.</span><span class="n">decode</span><span class="p">(</span><span class="s1">'ascii'</span><span class="p">)</span>
<span class="k">except</span> <span class="ne">UnicodeDecodeError</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">False</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">True</span>
<span class="n">df_trn</span> <span class="o">=</span> <span class="n">df_trn</span><span class="p">[</span><span class="n">df_trn</span><span class="p">[</span><span class="s1">'mthd'</span><span class="p">]</span><span class="o">.</span><span class="n">apply</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">is_ascii</span><span class="p">(</span><span class="n">x</span><span class="p">))]</span>
<span class="n">df_val</span> <span class="o">=</span> <span class="n">df_val</span><span class="p">[</span><span class="n">df_val</span><span class="p">[</span><span class="s1">'mthd'</span><span class="p">]</span><span class="o">.</span><span class="n">apply</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">is_ascii</span><span class="p">(</span><span class="n">x</span><span class="p">))]</span>
<span class="n">df_tst</span> <span class="o">=</span> <span class="n">df_tst</span><span class="p">[</span><span class="n">df_tst</span><span class="p">[</span><span class="s1">'mthd'</span><span class="p">]</span><span class="o">.</span><span class="n">apply</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">is_ascii</span><span class="p">(</span><span class="n">x</span><span class="p">))]</span>
<span class="n">df_trn</span> <span class="o">=</span> <span class="n">df_trn</span><span class="p">[</span><span class="n">df_trn</span><span class="p">[</span><span class="s1">'cmt'</span><span class="p">]</span><span class="o">.</span><span class="n">apply</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">is_ascii</span><span class="p">(</span><span class="n">x</span><span class="p">))]</span>
<span class="n">df_val</span> <span class="o">=</span> <span class="n">df_val</span><span class="p">[</span><span class="n">df_val</span><span class="p">[</span><span class="s1">'cmt'</span><span class="p">]</span><span class="o">.</span><span class="n">apply</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">is_ascii</span><span class="p">(</span><span class="n">x</span><span class="p">))]</span>
<span class="n">df_tst</span> <span class="o">=</span> <span class="n">df_tst</span><span class="p">[</span><span class="n">df_tst</span><span class="p">[</span><span class="s1">'cmt'</span><span class="p">]</span><span class="o">.</span><span class="n">apply</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">is_ascii</span><span class="p">(</span><span class="n">x</span><span class="p">))]</span>
<span class="nb">len</span><span class="p">(</span><span class="n">df_trn</span><span class="p">),</span> <span class="nb">len</span><span class="p">(</span><span class="n">df_val</span><span class="p">),</span> <span class="nb">len</span><span class="p">(</span><span class="n">df_tst</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
</p>
</details>
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_text output_subarea output_execute_result">
<pre>(4402, 141, 264)</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>Next, we'll remove any outdated comments by checking to see if the <a href="https://www.oracle.com/java/technologies/javase/javadoc.html">JavaDoc</a>'s parameter list is different from the method's parameter list. This also will remove pairs where the docstring doesn't actually document the parameters, which probably means the pairs are poor quality (you should always properly document your code :) ).</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<details class="description">
<summary class="btn btn-sm" data-open="Hide Code" data-close="Show Code"></summary>
<p><div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="kn">import</span> <span class="nn">re</span>
<span class="kn">from</span> <span class="nn">fast_trees.core</span> <span class="kn">import</span> <span class="n">FastParser</span>
<span class="n">parser</span> <span class="o">=</span> <span class="n">FastParser</span><span class="p">(</span><span class="s1">'java'</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">get_cmt_params</span><span class="p">(</span><span class="n">cmt</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-></span> <span class="n">List</span><span class="p">[</span><span class="nb">str</span><span class="p">]:</span>
<span class="sd">'''</span>
<span class="sd"> Grabs the parameter identifier names from a JavaDoc comment</span>
<span class="sd"> :param cmt: the comment to extract the parameter identifier names from</span>
<span class="sd"> :returns: an array of the parameter identifier names found in the given comment</span>
<span class="sd"> '''</span>
<span class="n">params</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">findall</span><span class="p">(</span><span class="s1">'@param+\s+\w+'</span><span class="p">,</span> <span class="n">cmt</span><span class="p">)</span>
<span class="n">param_names</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">param</span> <span class="ow">in</span> <span class="n">params</span><span class="p">:</span>
<span class="n">param_names</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">param</span><span class="o">.</span><span class="n">split</span><span class="p">()[</span><span class="mi">1</span><span class="p">])</span>
<span class="k">return</span> <span class="n">param_names</span>
<span class="k">def</span> <span class="nf">is_outdated</span><span class="p">(</span><span class="n">mthd</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">cmt</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">parser</span><span class="p">:</span> <span class="n">FastParser</span><span class="p">)</span> <span class="o">-></span> <span class="nb">bool</span><span class="p">:</span>
<span class="sd">'''</span>
<span class="sd"> Determines if a given method and comment are outdated by checking</span>
<span class="sd"> if the method's parameter identifier names match the comment's</span>
<span class="sd"> :param mthd: the method to compare against its corresponding comment</span>
<span class="sd"> :param cmt: the comment to compare against its corresponding method</span>
<span class="sd"> :param parser: parser for easily getting the parameter identifier names from a given method</span>
<span class="sd"> :returns: wheather or not a given comment is outdated compared to its corresponding method</span>
<span class="sd"> '''</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">mthd_params</span> <span class="o">=</span> <span class="n">parser</span><span class="o">.</span><span class="n">get_params</span><span class="p">(</span><span class="n">mthd</span><span class="p">)</span>
<span class="k">except</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">False</span>
<span class="n">cmt_params</span> <span class="o">=</span> <span class="n">get_cmt_params</span><span class="p">(</span><span class="n">cmt</span><span class="p">)</span>
<span class="k">return</span> <span class="n">mthd_params</span> <span class="o">!=</span> <span class="n">cmt_params</span>
<span class="n">df_trn</span> <span class="o">=</span> <span class="n">df_trn</span><span class="p">[</span>
<span class="o">~</span><span class="n">df_trn</span><span class="o">.</span><span class="n">apply</span><span class="p">(</span>
<span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">is_outdated</span><span class="p">(</span><span class="n">x</span><span class="o">.</span><span class="n">mthd</span><span class="p">,</span> <span class="n">x</span><span class="o">.</span><span class="n">cmt</span><span class="p">,</span> <span class="n">parser</span><span class="p">),</span> <span class="n">axis</span> <span class="o">=</span> <span class="mi">1</span>
<span class="p">)</span>
<span class="p">]</span>
<span class="n">df_val</span> <span class="o">=</span> <span class="n">df_val</span><span class="p">[</span>
<span class="o">~</span><span class="n">df_val</span><span class="o">.</span><span class="n">apply</span><span class="p">(</span>
<span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">is_outdated</span><span class="p">(</span><span class="n">x</span><span class="o">.</span><span class="n">mthd</span><span class="p">,</span> <span class="n">x</span><span class="o">.</span><span class="n">cmt</span><span class="p">,</span> <span class="n">parser</span><span class="p">),</span> <span class="n">axis</span> <span class="o">=</span> <span class="mi">1</span>
<span class="p">)</span>
<span class="p">]</span>
<span class="n">df_tst</span> <span class="o">=</span> <span class="n">df_tst</span><span class="p">[</span>
<span class="o">~</span><span class="n">df_tst</span><span class="o">.</span><span class="n">apply</span><span class="p">(</span>
<span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">is_outdated</span><span class="p">(</span><span class="n">x</span><span class="o">.</span><span class="n">mthd</span><span class="p">,</span> <span class="n">x</span><span class="o">.</span><span class="n">cmt</span><span class="p">,</span> <span class="n">parser</span><span class="p">),</span> <span class="n">axis</span> <span class="o">=</span> <span class="mi">1</span>
<span class="p">)</span>
<span class="p">]</span>
<span class="nb">len</span><span class="p">(</span><span class="n">df_trn</span><span class="p">),</span> <span class="nb">len</span><span class="p">(</span><span class="n">df_val</span><span class="p">),</span> <span class="nb">len</span><span class="p">(</span><span class="n">df_tst</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
</p>
</details>
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_subarea output_stream output_stdout output_text">
<pre>Downloading repo https://github.com/tree-sitter/tree-sitter-java to /usr/local/lib/python3.6/dist-packages/fast_trees/tree-sitter-java.
</pre>
</div>
</div>
<div class="output_area">
<div class="output_text output_subarea output_execute_result">
<pre>(4402, 141, 264)</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>Now we'll add in the additional pairs of code snippets/inline comments.</p>
<p>P.S. One thing to note with adding these pairs is that the inline comments will appear twice in the datasets. The first in the method where the inline comment came from and the second in the target for the code snippet. This is only a problem for the training set since it allows for the model to cheat by simply remembering the inline comment from the example method it came from. However, in my testing, I found this to not be an issue and the model seems to still work well despite this problem. Just thought ya should know :).</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<details class="description">
<summary class="btn btn-sm" data-open="Hide Code" data-close="Show Code"></summary>
<p><div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="kn">from</span> <span class="nn">tqdm.auto</span> <span class="kn">import</span> <span class="n">tqdm</span>
<span class="k">def</span> <span class="nf">get_inline_pairs</span><span class="p">(</span><span class="n">mthd</span><span class="p">):</span>
<span class="sd">'''</span>
<span class="sd"> Get all pairs of inline comments and corresponding code snippets</span>
<span class="sd"> :param mthd: the method to retrieve the pairs of comments and corresponding</span>
<span class="sd"> code snippets from</span>
<span class="sd"> :returns: all pairs of comments and corresponding code snippets</span>
<span class="sd"> '''</span>
<span class="n">pairs</span> <span class="o">=</span> <span class="p">[[]]</span>
<span class="n">comment</span> <span class="o">=</span> <span class="kc">False</span>
<span class="n">bracket</span> <span class="o">=</span> <span class="kc">False</span>
<span class="n">indent_lvl</span> <span class="o">=</span> <span class="o">-</span><span class="mi">1</span>
<span class="n">lines</span> <span class="o">=</span> <span class="n">mthd</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s2">"</span><span class="se">\n</span><span class="s2">"</span><span class="p">)</span>
<span class="k">for</span> <span class="n">line</span> <span class="ow">in</span> <span class="n">lines</span><span class="p">:</span>
<span class="k">if</span> <span class="s2">"//"</span> <span class="ow">in</span> <span class="n">line</span> <span class="ow">and</span> <span class="ow">not</span> <span class="n">bracket</span> <span class="ow">and</span> <span class="ow">not</span> <span class="s2">"://"</span> <span class="ow">in</span> <span class="n">line</span><span class="p">:</span>
<span class="n">pairs</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">line</span><span class="p">)</span>
<span class="k">if</span> <span class="s1">'</span><span class="se">\t</span><span class="s1">'</span> <span class="ow">in</span> <span class="n">line</span><span class="p">:</span>
<span class="n">indent_lvl</span> <span class="o">=</span> <span class="n">line</span><span class="o">.</span><span class="n">count</span><span class="p">(</span><span class="s1">'</span><span class="se">\t</span><span class="s1">'</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">indent_lvl</span> <span class="o">=</span> <span class="n">line</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s2">"//"</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">count</span><span class="p">(</span><span class="s1">' '</span><span class="p">)</span>
<span class="n">comment</span> <span class="o">=</span> <span class="kc">True</span>
<span class="n">bracket</span> <span class="o">=</span> <span class="kc">False</span>
<span class="k">elif</span> <span class="n">comment</span><span class="p">:</span>
<span class="k">if</span> <span class="s1">'{'</span> <span class="ow">in</span> <span class="n">line</span> <span class="ow">and</span> <span class="ow">not</span> <span class="n">bracket</span><span class="p">:</span>
<span class="n">bracket</span> <span class="o">=</span> <span class="kc">True</span>
<span class="n">pairs</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">line</span><span class="p">)</span>
<span class="k">elif</span> <span class="s1">'}'</span> <span class="ow">in</span> <span class="n">line</span><span class="p">:</span>
<span class="n">line_indent</span> <span class="o">=</span> <span class="o">-</span><span class="mi">1</span>
<span class="k">if</span> <span class="s1">'</span><span class="se">\t</span><span class="s1">'</span> <span class="ow">in</span> <span class="n">line</span><span class="p">:</span>
<span class="n">line_indent</span> <span class="o">=</span> <span class="n">line</span><span class="o">.</span><span class="n">count</span><span class="p">(</span><span class="s1">'</span><span class="se">\t</span><span class="s1">'</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">line_indent</span> <span class="o">=</span> <span class="n">line</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s2">"//"</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">count</span><span class="p">(</span><span class="s1">' '</span><span class="p">)</span>
<span class="k">if</span> <span class="n">indent_lvl</span> <span class="o">==</span> <span class="n">line_indent</span><span class="p">:</span>
<span class="n">pairs</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">line</span><span class="p">)</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">bracket</span><span class="p">:</span>
<span class="n">pairs</span><span class="o">.</span><span class="n">append</span><span class="p">([])</span>
<span class="n">comment</span> <span class="o">=</span> <span class="kc">False</span>
<span class="n">bracket</span> <span class="o">=</span> <span class="kc">False</span>
<span class="k">elif</span> <span class="n">line</span><span class="o">.</span><span class="n">isspace</span><span class="p">()</span> <span class="ow">or</span> <span class="n">line</span> <span class="o">==</span> <span class="s1">''</span> <span class="ow">and</span> <span class="ow">not</span> <span class="n">bracket</span><span class="p">:</span>
<span class="n">pairs</span><span class="o">.</span><span class="n">append</span><span class="p">([])</span>
<span class="n">comment</span> <span class="o">=</span> <span class="kc">False</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">pairs</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">line</span><span class="p">)</span>
<span class="c1"># Convert pairs into proper format of (code snippet, inline comment) dataframe</span>
<span class="n">code_snippets</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">comments</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">pair</span> <span class="ow">in</span> <span class="n">pairs</span><span class="p">:</span>
<span class="k">if</span> <span class="n">pair</span> <span class="ow">and</span> <span class="nb">len</span><span class="p">(</span><span class="n">pair</span><span class="p">)</span> <span class="o"><</span> <span class="mi">5</span><span class="p">:</span>
<span class="n">code</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">comment</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">skip</span> <span class="o">=</span> <span class="kc">False</span>
<span class="k">for</span> <span class="n">line</span> <span class="ow">in</span> <span class="n">pair</span><span class="p">:</span>
<span class="k">if</span> <span class="s2">"TODO"</span> <span class="ow">in</span> <span class="n">line</span><span class="p">:</span> <span class="k">break</span>
<span class="k">if</span> <span class="s2">"//"</span> <span class="ow">in</span> <span class="n">line</span><span class="p">:</span>
<span class="n">comment</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">line</span><span class="o">.</span><span class="n">replace</span><span class="p">(</span><span class="s1">'//'</span><span class="p">,</span> <span class="s1">''</span><span class="p">))</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">code</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">line</span><span class="p">)</span>
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">code</span><span class="p">)</span> <span class="o">></span> <span class="mi">1</span> <span class="ow">and</span> <span class="nb">len</span><span class="p">(</span><span class="n">comment</span><span class="p">)</span> <span class="o">></span> <span class="mi">0</span><span class="p">:</span>
<span class="n">code_snippets</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="s1">'</span><span class="se">\n</span><span class="s1">'</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">code</span><span class="p">))</span>
<span class="n">comments</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="s1">'</span><span class="se">\n</span><span class="s1">'</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">comment</span><span class="p">))</span>
<span class="n">pairs</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">(</span><span class="nb">zip</span><span class="p">(</span><span class="n">code_snippets</span><span class="p">,</span> <span class="n">comments</span><span class="p">),</span> <span class="n">columns</span> <span class="o">=</span> <span class="p">[</span><span class="s2">"mthd"</span><span class="p">,</span> <span class="s2">"cmt"</span><span class="p">])</span>
<span class="k">return</span> <span class="n">pairs</span>
<span class="k">def</span> <span class="nf">add_inline</span><span class="p">(</span><span class="n">df</span><span class="p">:</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">)</span> <span class="o">-></span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">:</span>
<span class="sd">'''</span>
<span class="sd"> Helper function to go through all methods in a given dataframe and add all</span>
<span class="sd"> pairs of inline comments and corresponding code snippets</span>
<span class="sd"> :param df: the dataframe to retrieve and add all pairs of inline comments</span>
<span class="sd"> and corresponding code snippets to</span>
<span class="sd"> :returns: a new dataframe with the newly added pairs of inline comments and</span>
<span class="sd"> corresponding code snippets</span>
<span class="sd"> '''</span>
<span class="n">new_df</span> <span class="o">=</span> <span class="n">df</span><span class="p">[</span><span class="n">df</span><span class="p">[</span><span class="s1">'mthd'</span><span class="p">]</span><span class="o">.</span><span class="n">str</span><span class="o">.</span><span class="n">contains</span><span class="p">(</span><span class="s2">"//"</span><span class="p">)]</span>
<span class="n">all_pairs</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">mthd</span> <span class="ow">in</span> <span class="n">tqdm</span><span class="p">(</span><span class="n">new_df</span><span class="o">.</span><span class="n">mthd</span><span class="o">.</span><span class="n">values</span><span class="p">):</span>
<span class="n">pairs</span> <span class="o">=</span> <span class="n">get_inline_pairs</span><span class="p">(</span><span class="n">mthd</span><span class="p">)</span>
<span class="n">all_pairs</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">pairs</span><span class="p">)</span>
<span class="n">df_pairs</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">concat</span><span class="p">([</span><span class="n">pairs</span> <span class="k">for</span> <span class="n">pairs</span> <span class="ow">in</span> <span class="n">all_pairs</span><span class="p">])</span>
<span class="k">return</span> <span class="n">pd</span><span class="o">.</span><span class="n">concat</span><span class="p">([</span><span class="n">df</span><span class="p">,</span> <span class="n">df_pairs</span><span class="p">])</span>
<span class="n">df_trn</span> <span class="o">=</span> <span class="n">add_inline</span><span class="p">(</span><span class="n">df_trn</span><span class="p">)</span>
<span class="n">df_val</span> <span class="o">=</span> <span class="n">add_inline</span><span class="p">(</span><span class="n">df_val</span><span class="p">)</span>
<span class="n">df_tst</span> <span class="o">=</span> <span class="n">add_inline</span><span class="p">(</span><span class="n">df_tst</span><span class="p">)</span>
<span class="nb">len</span><span class="p">(</span><span class="n">df_trn</span><span class="p">),</span> <span class="nb">len</span><span class="p">(</span><span class="n">df_val</span><span class="p">),</span> <span class="nb">len</span><span class="p">(</span><span class="n">df_tst</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
</p>
</details>
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_subarea output_stream output_stdout output_text">
<pre>
</pre>
</div>
</div>
<div class="output_area">
<div class="output_text output_subarea output_execute_result">
<pre>(4584, 150, 271)</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>We'll also remove pairs where the size of the code is smaller than the comment. This is because I found that in these cases the comments contain a bunch of extra information that the model won't have access to such as how the method is being used by other methods in the software system.</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<details class="description">
<summary class="btn btn-sm" data-open="Hide Code" data-close="Show Code"></summary>
<p><div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">df_trn</span> <span class="o">=</span> <span class="n">df_trn</span><span class="p">[</span><span class="n">df_trn</span><span class="o">.</span><span class="n">apply</span><span class="p">(</span><span class="k">lambda</span> <span class="n">row</span><span class="p">:</span> <span class="nb">len</span><span class="p">(</span><span class="n">row</span><span class="o">.</span><span class="n">mthd</span><span class="p">)</span> <span class="o">></span> <span class="nb">len</span><span class="p">(</span><span class="n">row</span><span class="o">.</span><span class="n">cmt</span><span class="p">),</span> <span class="n">axis</span> <span class="o">=</span> <span class="mi">1</span><span class="p">)]</span>
<span class="n">df_val</span> <span class="o">=</span> <span class="n">df_val</span><span class="p">[</span><span class="n">df_val</span><span class="o">.</span><span class="n">apply</span><span class="p">(</span><span class="k">lambda</span> <span class="n">row</span><span class="p">:</span> <span class="nb">len</span><span class="p">(</span><span class="n">row</span><span class="o">.</span><span class="n">mthd</span><span class="p">)</span> <span class="o">></span> <span class="nb">len</span><span class="p">(</span><span class="n">row</span><span class="o">.</span><span class="n">cmt</span><span class="p">),</span> <span class="n">axis</span> <span class="o">=</span> <span class="mi">1</span><span class="p">)]</span>
<span class="n">df_tst</span> <span class="o">=</span> <span class="n">df_tst</span><span class="p">[</span><span class="n">df_tst</span><span class="o">.</span><span class="n">apply</span><span class="p">(</span><span class="k">lambda</span> <span class="n">row</span><span class="p">:</span> <span class="nb">len</span><span class="p">(</span><span class="n">row</span><span class="o">.</span><span class="n">mthd</span><span class="p">)</span> <span class="o">></span> <span class="nb">len</span><span class="p">(</span><span class="n">row</span><span class="o">.</span><span class="n">cmt</span><span class="p">),</span> <span class="n">axis</span> <span class="o">=</span> <span class="mi">1</span><span class="p">)]</span>
<span class="nb">len</span><span class="p">(</span><span class="n">df_trn</span><span class="p">),</span> <span class="nb">len</span><span class="p">(</span><span class="n">df_val</span><span class="p">),</span> <span class="nb">len</span><span class="p">(</span><span class="n">df_tst</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
</p>
</details>
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_text output_subarea output_execute_result">
<pre>(3713, 111, 228)</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>Next, we'll remove any examples that have the special \<code> tag since these also tend to contain extra information that the model doesn't have a good hope of generating.</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<details class="description">
<summary class="btn btn-sm" data-open="Hide Code" data-close="Show Code"></summary>
<p><div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="k">def</span> <span class="nf">has_code</span><span class="p">(</span><span class="n">cmt</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-></span> <span class="nb">bool</span><span class="p">:</span>
<span class="sd">'''</span>
<span class="sd"> Determinine if the given comment contains the HTML <code> tag</span>
<span class="sd"> :param cmt: the comment to check whether it contains the HTML <code> tag</span>
<span class="sd"> :returns: whether or not the given comment contains the HTML <code> tag</span>
<span class="sd"> '''</span>
<span class="k">if</span> <span class="s1">'<code>'</span> <span class="ow">in</span> <span class="n">cmt</span><span class="p">:</span> <span class="k">return</span> <span class="kc">True</span>
<span class="k">else</span><span class="p">:</span> <span class="k">return</span> <span class="kc">False</span>
<span class="n">df_trn</span> <span class="o">=</span> <span class="n">df_trn</span><span class="p">[</span><span class="o">~</span><span class="n">df_trn</span><span class="p">[</span><span class="s1">'cmt'</span><span class="p">]</span><span class="o">.</span><span class="n">apply</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">has_code</span><span class="p">(</span><span class="n">x</span><span class="p">))]</span>
<span class="n">df_val</span> <span class="o">=</span> <span class="n">df_val</span><span class="p">[</span><span class="o">~</span><span class="n">df_val</span><span class="p">[</span><span class="s1">'cmt'</span><span class="p">]</span><span class="o">.</span><span class="n">apply</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">has_code</span><span class="p">(</span><span class="n">x</span><span class="p">))]</span>
<span class="n">df_tst</span> <span class="o">=</span> <span class="n">df_tst</span><span class="p">[</span><span class="o">~</span><span class="n">df_tst</span><span class="p">[</span><span class="s1">'cmt'</span><span class="p">]</span><span class="o">.</span><span class="n">apply</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">has_code</span><span class="p">(</span><span class="n">x</span><span class="p">))]</span>
<span class="nb">len</span><span class="p">(</span><span class="n">df_trn</span><span class="p">),</span> <span class="nb">len</span><span class="p">(</span><span class="n">df_val</span><span class="p">),</span> <span class="nb">len</span><span class="p">(</span><span class="n">df_tst</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
</p>
</details>
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_text output_subarea output_execute_result">
<pre>(3580, 104, 221)</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>Lastly, we're gonna remove the JavaDoc parts of the comments other than the description since that is really all we care about. The other pieces of information can usually be autogenerated or may require external knowledge to document them.</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<details class="description">
<summary class="btn btn-sm" data-open="Hide Code" data-close="Show Code"></summary>
<p><div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="k">def</span> <span class="nf">remove_jdocs</span><span class="p">(</span><span class="n">df</span><span class="p">:</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">)</span> <span class="o">-></span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">:</span>
<span class="sd">'''</span>
<span class="sd"> Remove the JavaDocs leaving only the description of the comment</span>
<span class="sd"> :param df: the pandas dataframe to remove the JavaDocs from</span>
<span class="sd"> :returns: a new pandas dataframe with the JavaDocs removed</span>
<span class="sd"> '''</span>
<span class="n">methods</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">comments</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">tqdm</span><span class="p">(</span><span class="nb">list</span><span class="p">(</span><span class="n">df</span><span class="o">.</span><span class="n">iterrows</span><span class="p">())):</span>
<span class="n">comment</span> <span class="o">=</span> <span class="n">row</span><span class="p">[</span><span class="s2">"cmt"</span><span class="p">]</span>
<span class="c1"># Remove {} text in comments from https://stackoverflow.com/questions/14596884/remove-text-between-and-in-python/14598135</span>
<span class="n">comment</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">sub</span><span class="p">(</span><span class="s2">"([\{\[]).*?([\)\}])"</span><span class="p">,</span> <span class="s1">''</span><span class="p">,</span> <span class="n">comment</span><span class="p">)</span>
<span class="n">cleaned</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">line</span> <span class="ow">in</span> <span class="n">comment</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">'</span><span class="se">\n</span><span class="s1">'</span><span class="p">):</span>
<span class="k">if</span> <span class="s2">"@"</span> <span class="ow">in</span> <span class="n">line</span><span class="p">:</span> <span class="k">break</span>
<span class="n">cleaned</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">line</span><span class="p">)</span>
<span class="n">comments</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="s1">'</span><span class="se">\n</span><span class="s1">'</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">cleaned</span><span class="p">))</span>
<span class="n">methods</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">row</span><span class="p">[</span><span class="s2">"mthd"</span><span class="p">])</span>
<span class="n">new_df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">(</span><span class="nb">zip</span><span class="p">(</span><span class="n">methods</span><span class="p">,</span> <span class="n">comments</span><span class="p">),</span> <span class="n">columns</span> <span class="o">=</span> <span class="p">[</span><span class="s2">"mthd"</span><span class="p">,</span> <span class="s2">"cmt"</span><span class="p">])</span>
<span class="k">return</span> <span class="n">new_df</span>
<span class="n">df_trn</span> <span class="o">=</span> <span class="n">remove_jdocs</span><span class="p">(</span><span class="n">df_trn</span><span class="p">);</span>
<span class="n">df_val</span> <span class="o">=</span> <span class="n">remove_jdocs</span><span class="p">(</span><span class="n">df_val</span><span class="p">);</span>
<span class="n">df_tst</span> <span class="o">=</span> <span class="n">remove_jdocs</span><span class="p">(</span><span class="n">df_tst</span><span class="p">);</span>
</pre></div>
</div>
</div>
</div>
</p>
</details>
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_subarea output_stream output_stdout output_text">
<pre>
</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>Almost there! In this step, we'll remove any HTML tags from the comments so the model doesn't have to also learn HTML. Bless those that do...</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<details class="description">
<summary class="btn btn-sm" data-open="Hide Code" data-close="Show Code"></summary>
<p><div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="k">def</span> <span class="nf">clean_html</span><span class="p">(</span><span class="n">cmt</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-></span> <span class="nb">str</span><span class="p">:</span>
<span class="sd">'''</span>
<span class="sd"> Remove any HTML tags from a given comment</span>
<span class="sd"> :param cmt: the comment to remove any HTML tags from</span>
<span class="sd"> :returns: the comment with any HTML tags removed</span>
<span class="sd"> '''</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">sub</span><span class="p">(</span><span class="sa">r</span><span class="s2">"<.?span[^>]*>|<.?code[^>]*>|<.?p[^>]*>|<.?hr[^>]*>|<.?h[1-3][^>]*>|<.?a[^>]*>|<.?b[^>]*>|<.?blockquote[^>]*>|<.?del[^>]*>|<.?dd[^>]*>|<.?dl[^>]*>|<.?dt[^>]*>|<.?em[^>]*>|<.?i[^>]*>|<.?img[^>]*>|<.?kbd[^>]*>|<.?li[^>]*>|<.?ol[^>]*>|<.?pre[^>]*>|<.?s[^>]*>|<.?sup[^>]*>|<.?sub[^>]*>|<.?strong[^>]*>|<.?strike[^>]*>|<.?ul[^>]*>|<.?br[^>]*>"</span><span class="p">,</span> <span class="s2">""</span><span class="p">,</span> <span class="n">cmt</span><span class="p">)</span>
<span class="k">return</span> <span class="n">result</span>
<span class="n">df_trn</span><span class="o">.</span><span class="n">cmt</span> <span class="o">=</span> <span class="n">df_trn</span><span class="o">.</span><span class="n">cmt</span><span class="o">.</span><span class="n">apply</span><span class="p">(</span><span class="n">clean_html</span><span class="p">)</span>
<span class="n">df_val</span><span class="o">.</span><span class="n">cmt</span> <span class="o">=</span> <span class="n">df_val</span><span class="o">.</span><span class="n">cmt</span><span class="o">.</span><span class="n">apply</span><span class="p">(</span><span class="n">clean_html</span><span class="p">)</span>
<span class="n">df_tst</span><span class="o">.</span><span class="n">cmt</span> <span class="o">=</span> <span class="n">df_tst</span><span class="o">.</span><span class="n">cmt</span><span class="o">.</span><span class="n">apply</span><span class="p">(</span><span class="n">clean_html</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
</p>
</details>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>FINALLY!! We'll make everything lower case, remove extra whitespace, remove empty comments, and remove duplicates.</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<details class="description">
<summary class="btn btn-sm" data-open="Hide Code" data-close="Show Code"></summary>
<p><div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">df_trn</span> <span class="o">=</span> <span class="n">df_trn</span><span class="o">.</span><span class="n">applymap</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="s1">' '</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">x</span><span class="o">.</span><span class="n">split</span><span class="p">())</span><span class="o">.</span><span class="n">lower</span><span class="p">())</span>
<span class="n">df_val</span> <span class="o">=</span> <span class="n">df_val</span><span class="o">.</span><span class="n">applymap</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="s1">' '</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">x</span><span class="o">.</span><span class="n">split</span><span class="p">())</span><span class="o">.</span><span class="n">lower</span><span class="p">())</span>
<span class="n">df_tst</span> <span class="o">=</span> <span class="n">df_tst</span><span class="o">.</span><span class="n">applymap</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="s1">' '</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">x</span><span class="o">.</span><span class="n">split</span><span class="p">())</span><span class="o">.</span><span class="n">lower</span><span class="p">())</span>
<span class="n">df_trn</span> <span class="o">=</span> <span class="n">df_trn</span><span class="p">[</span><span class="o">~</span><span class="p">(</span><span class="n">df_trn</span><span class="p">[</span><span class="s1">'cmt'</span><span class="p">]</span> <span class="o">==</span> <span class="s1">''</span><span class="p">)]</span>
<span class="n">df_val</span> <span class="o">=</span> <span class="n">df_val</span><span class="p">[</span><span class="o">~</span><span class="p">(</span><span class="n">df_val</span><span class="p">[</span><span class="s1">'cmt'</span><span class="p">]</span> <span class="o">==</span> <span class="s1">''</span><span class="p">)]</span>
<span class="n">df_tst</span> <span class="o">=</span> <span class="n">df_tst</span><span class="p">[</span><span class="o">~</span><span class="p">(</span><span class="n">df_tst</span><span class="p">[</span><span class="s1">'cmt'</span><span class="p">]</span> <span class="o">==</span> <span class="s1">''</span><span class="p">)]</span>
<span class="n">df_trn</span> <span class="o">=</span> <span class="n">df_trn</span><span class="p">[</span><span class="o">~</span><span class="n">df_trn</span><span class="p">[</span><span class="s1">'cmt'</span><span class="p">]</span><span class="o">.</span><span class="n">duplicated</span><span class="p">()]</span>
<span class="n">df_val</span> <span class="o">=</span> <span class="n">df_val</span><span class="p">[</span><span class="o">~</span><span class="n">df_val</span><span class="p">[</span><span class="s1">'cmt'</span><span class="p">]</span><span class="o">.</span><span class="n">duplicated</span><span class="p">()]</span>
<span class="n">df_tst</span> <span class="o">=</span> <span class="n">df_tst</span><span class="p">[</span><span class="o">~</span><span class="n">df_tst</span><span class="p">[</span><span class="s1">'cmt'</span><span class="p">]</span><span class="o">.</span><span class="n">duplicated</span><span class="p">()]</span>
<span class="nb">len</span><span class="p">(</span><span class="n">df_trn</span><span class="p">),</span> <span class="nb">len</span><span class="p">(</span><span class="n">df_val</span><span class="p">),</span> <span class="nb">len</span><span class="p">(</span><span class="n">df_tst</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
</p>
</details>
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_text output_subarea output_execute_result">
<pre>(3094, 94, 205)</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>Now let's see what the data looks like.</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">df_trn</span><span class="o">.</span><span class="n">head</span><span class="p">()</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_html rendered_html output_subarea output_execute_result">
<div>
<style scoped="">
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>mthd</th>
<th>cmt</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>public static void checkjavainternalaccess(ilo...</td>
<td>prints warning to given if hazelcast is not pr...</td>
</tr>
<tr>
<th>1</th>
<td>public void marshall(scte20plusembeddeddestina...</td>
<td>marshall the given parameter object.</td>
</tr>
<tr>
<th>2</th>
<td>@override public void prefetchtoken(final file...</td>
<td>/* gets hadoop tokens for a user to run mapred...</td>
</tr>
<tr>
<th>3</th>
<td>@override public <y> singularattribute<x, y> g...</td>
<td>/* (non-javadoc)</td>
</tr>
<tr>
<th>4</th>
<td>public void sync(boolean syncallsegments) { co...</td>
<td>forces a disk flush on the commit log files th...</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h2 id="Data-Exploring">Data Exploring<a class="anchor-link" href="#Data-Exploring"> </a></h2><p>As good Data Scientists, we will also explore our data to uncover any secrets. Data can be sneaky like that :).</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<details class="description">
<summary class="btn btn-sm" data-open="Hide Code" data-close="Show Code"></summary>
<p><div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
<span class="kn">from</span> <span class="nn">collections</span> <span class="kn">import</span> <span class="n">Counter</span>
<span class="kn">from</span> <span class="nn">statistics</span> <span class="kn">import</span> <span class="n">mean</span><span class="p">,</span> <span class="n">median</span><span class="p">,</span> <span class="n">stdev</span>
<span class="kn">from</span> <span class="nn">transformers</span> <span class="kn">import</span> <span class="n">AutoTokenizer</span>
<span class="k">def</span> <span class="nf">get_counter</span><span class="p">(</span><span class="n">df</span><span class="p">:</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">,</span> <span class="n">tokenizer</span><span class="p">:</span> <span class="n">AutoTokenizer</span><span class="p">,</span> <span class="n">col</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-></span> <span class="n">Counter</span><span class="p">:</span>
<span class="sd">'''</span>
<span class="sd"> Get the counts for each token in a given pandas dataframe column</span>
<span class="sd"> :param df: the pandas dataframe to get the counts of tokens from</span>
<span class="sd"> :param tokenizer: the tokenizer to use for tokenizing the rows in the pandas</span>
<span class="sd"> dataframe</span>
<span class="sd"> :param col: the column to grab rows from when tokenizing</span>
<span class="sd"> :returns: the counts of each token in the given pandas dataframe</span>
<span class="sd"> column</span>
<span class="sd"> '''</span>
<span class="n">toks</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">df</span><span class="o">.</span><span class="n">iterrows</span><span class="p">():</span>
<span class="n">toks</span><span class="o">.</span><span class="n">extend</span><span class="p">(</span><span class="n">tokenizer</span><span class="o">.</span><span class="n">tokenize</span><span class="p">(</span><span class="n">row</span><span class="p">[</span><span class="n">col</span><span class="p">]))</span>
<span class="n">cnt</span> <span class="o">=</span> <span class="n">Counter</span><span class="p">()</span>
<span class="k">for</span> <span class="n">tok</span> <span class="ow">in</span> <span class="n">toks</span><span class="p">:</span>
<span class="n">cnt</span><span class="p">[</span><span class="n">tok</span><span class="p">]</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="k">return</span> <span class="n">cnt</span>
<span class="n">tokenizer</span> <span class="o">=</span> <span class="n">AutoTokenizer</span><span class="o">.</span><span class="n">from_pretrained</span><span class="p">(</span><span class="s1">'microsoft/codebert-base'</span><span class="p">)</span>
<span class="n">mthd_cnt</span> <span class="o">=</span> <span class="n">get_counter</span><span class="p">(</span><span class="n">df_trn</span><span class="p">,</span> <span class="n">tokenizer</span><span class="p">,</span> <span class="s1">'mthd'</span><span class="p">)</span>
<span class="n">cmt_cnt</span> <span class="o">=</span> <span class="n">get_counter</span><span class="p">(</span><span class="n">df_trn</span><span class="p">,</span> <span class="n">tokenizer</span><span class="p">,</span> <span class="s1">'cmt'</span><span class="p">)</span>
<span class="n">mthd_lens</span> <span class="o">=</span> <span class="n">df_trn</span><span class="o">.</span><span class="n">mthd</span><span class="o">.</span><span class="n">apply</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="nb">len</span><span class="p">(</span><span class="n">tokenizer</span><span class="o">.</span><span class="n">tokenize</span><span class="p">(</span><span class="n">x</span><span class="p">)))</span><span class="o">.</span><span class="n">values</span>
<span class="n">cmt_lens</span> <span class="o">=</span> <span class="n">df_trn</span><span class="o">.</span><span class="n">cmt</span><span class="o">.</span><span class="n">apply</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="nb">len</span><span class="p">(</span><span class="n">tokenizer</span><span class="o">.</span><span class="n">tokenize</span><span class="p">(</span><span class="n">x</span><span class="p">)))</span><span class="o">.</span><span class="n">values</span>
<span class="n">max_mthd_len</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">quantile</span><span class="p">(</span><span class="n">mthd_lens</span><span class="p">,</span> <span class="mf">0.95</span><span class="p">))</span>
<span class="n">max_cmt_len</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">quantile</span><span class="p">(</span><span class="n">cmt_lens</span><span class="p">,</span> <span class="mf">0.95</span><span class="p">))</span>
</pre></div>
</div>
</div>
</div>
</p>
</details>
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_subarea output_stream output_stdout output_text">
<pre>
</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<details class="description">
<summary class="btn btn-sm" data-open="Hide Code" data-close="Show Code"></summary>
<p><div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="nn">plt</span>
<span class="k">def</span> <span class="nf">plot_counts</span><span class="p">(</span><span class="n">counts</span><span class="p">:</span><span class="n">Counter</span><span class="p">,</span> <span class="n">top_k</span><span class="p">:</span> <span class="n">Optional</span><span class="p">[</span><span class="nb">int</span><span class="p">]</span> <span class="o">=</span> <span class="mi">30</span><span class="p">):</span>
<span class="sd">'''</span>
<span class="sd"> Plot a bar chart of the most common tokens</span>
<span class="sd"> :param counts: the counts of each token</span>
<span class="sd"> :param top_k: the number of tokens to display in the plot</span>
<span class="sd"> '''</span>
<span class="n">labels</span><span class="p">,</span> <span class="n">values</span> <span class="o">=</span> <span class="nb">zip</span><span class="p">(</span><span class="o">*</span><span class="n">counts</span><span class="o">.</span><span class="n">most_common</span><span class="p">()[:</span><span class="n">top_k</span><span class="p">])</span>
<span class="n">indexes</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">arange</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">labels</span><span class="p">))</span>
<span class="n">width</span> <span class="o">=</span> <span class="mi">1</span>
<span class="n">plt</span><span class="o">.</span><span class="n">figure</span><span class="p">(</span><span class="n">num</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">22</span><span class="p">,</span> <span class="mi">4</span><span class="p">),</span> <span class="n">dpi</span><span class="o">=</span><span class="mi">60</span><span class="p">,</span> <span class="n">facecolor</span><span class="o">=</span><span class="s1">'w'</span><span class="p">,</span> <span class="n">edgecolor</span><span class="o">=</span><span class="s1">'k'</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">bar</span><span class="p">(</span><span class="n">indexes</span><span class="p">,</span> <span class="n">values</span><span class="p">,</span> <span class="n">width</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">xticks</span><span class="p">(</span><span class="n">indexes</span> <span class="o">+</span> <span class="n">width</span> <span class="o">*</span> <span class="mf">0.5</span><span class="p">,</span> <span class="n">labels</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
</pre></div>
</div>
</div>
</div>
</p>
</details>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>Let's look at the most common tokens in our methods and comments.</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">plot_counts</span><span class="p">(</span><span class="n">mthd_cnt</span><span class="p">,</span> <span class="n">top_k</span> <span class="o">=</span> <span class="mi">30</span><span class="p">)</span>
<span class="n">plot_counts</span><span class="p">(</span><span class="n">cmt_cnt</span><span class="p">,</span> <span class="n">top_k</span> <span class="o">=</span> <span class="mi">30</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_png output_subarea ">
<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAABCsAAADQCAYAAAAu0euYAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAAJOgAACToB8GSSSgAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3df1RUdf7H8ZeaYuoCEma0QFaSWpttq7kGDgwYSGhmpuOP8Ehn7Yfalluart+OpshZs5PZ1mZ6FLHSFFyz3TBUxFF+mKUn7ZeamRbiqZXGAM3k1/3+4WGCBBSdHxd8Ps7xHLnMfN6f98zlMrzmc++0MgzDEAAAAAAAgEm09vYEAAAAAAAAaiOsAAAAAAAApkJYAQAAAAAATIWwAgAAAAAAmMpV3p5AQ/r06aObb77Z29MAAAAAAABucvjwYe3Zs+e87aYNK26++Walp6d7exoAAAAAAMBNbDZbvds5DQQAAAAAAJgKYQUAAAAAADAVwgoAAAAAAGAqhBUAAAAAAMBUCCsAAAAAAICpEFYAAAAAAABTIawAAAAAAACmcpW3J9ASdZuR6bXaR+cP9lptAAAAAABcgZUVAAAAAADAVAgrAAAAAACAqRBWAAAAAAAAUyGsAAAAAAAApkJYAQAAAAAATIWwAgAAAAAAmAphBQAAAAAAMBXCCgAAAAAAYCqEFQAAAAAAwFQIKwAAAAAAgKk0GlZ89NFHuvvuuxUZGakxY8aooqJCYWFhslqtslqt2rJliyTpwIEDioyMVHh4uLZu3SpJOn36tIYPH64BAwZowYIFzjGnT58ui8WicePGqaKiwo2tAQAAAACA5qjRsCIkJEQ5OTnasWOHunXrpvfee09+fn6y2+2y2+2KjY2VJM2cOVPLly9XVlaWZs2aJUlatmyZEhISlJeXp5ycHBUVFWnfvn0qKipSbm6uevbsqXXr1rm/QwAAAAAA0Kw0GlYEBQXp6quvliS1a9dOrVu31qlTpxQVFaWxY8fK4XBIko4fP66wsDD5+voqICBAxcXFKigoUFxcnCQpNjZWO3furLMtPj5e+fn57uwNAAAAAAA0Qxd1zYpvv/1Wmzdv1n333af8/Hxt375d8fHxmj17tiSpurraeVs/Pz85HA6dPHlSvr6+F9xWW0ZGhmw2m2w2mwoLC13SIAAAAAAAaF4uGFaUlpZq3LhxSktLU9u2bXXNNddIkkaMGKF9+/adG6T1r8OUlJQoICBA/v7+Ki0tveC22kaOHKn09HSlp6crJCTENR0CAAAAAIBmpdGworKyUqNHj9bs2bPVo0cPlZeX6+zZs5Kk3Nxcde/eXdK500UOHz6ssrIyORwOBQYGKjw8XNnZ2ZKk7Oxs9e/fv862TZs2KSIiwp29AQAAAACAZuiqxr75zjvvaNeuXUpOTlZycrImTpyoBQsWqGPHjvLx8VFqaqokKSUlRUlJSaqqqtKcOXMkSRMmTFBiYqJSU1M1ZMgQBQcHKzg4WF27dpXFYlFoaKimTp3q/g4BAAAAAECz0sowDMPbk6iPzWZTenq6t6dxSbrNyPRa7aPzB3utNgAAAAAATdHQ3/4XdYFNAAAAAAAATyGsAAAAAAAApkJYAQAAAAAATIWwAgAAAAAAmAphBQAAAAAAMBXCCgAAAAAAYCqEFQAAAAAAwFQIKwAAAAAAgKkQVgAAAAAAAFMhrAAAAAAAAKZCWAEAAAAAAEyFsAIAAAAAAJgKYQUAAAAAADAVwgoAAAAAAGAqhBUAAAAAAMBUCCsAAAAAAICpEFYAAAAAAABTIawAAAAAAACmQlgBAAAAAABMhbACAAAAAACYCmEFAAAAAAAwFcIKAAAAAABgKoQVAAAAAADAVAgrAAAAAACAqTQaVnz00Ue6++67FRkZqTFjxqiiokIZGRkKDw/XwIEDdezYMUnSgQMHFBkZqfDwcG3dulWSdPr0aQ0fPlwDBgzQggULnGNOnz5dFotF48aNU0VFhRtbAwAAAAAAzVGjYUVISIhycnK0Y8cOdevWTe+9954WLlwou92uuXPnKjk5WZI0c+ZMLV++XFlZWZo1a5YkadmyZUpISFBeXp5ycnJUVFSkffv2qaioSLm5uerZs6fWrVvn/g4BAAAAAECz0mhYERQUpKuvvlqS1K5dOx08eFC9evVSu3btFBERoU8//VSSdPz4cYWFhcnX11cBAQEqLi5WQUGB4uLiJEmxsbHauXNnnW3x8fHKz893Z28AAAAAAKAZuupibvTtt99q8+bNmj9/vk6cOOHcXlVVJUmqrq52bvPz85PD4dDJkyfl6+t73ragoKA622rLyMhQRkaGJKmwsPAy2gIAAAAAAM3VBcOK0tJSjRs3TmlpaaqqqlJpaanze23atJEktW796wKNkpISBQQEyN/fX6WlpfL391dJSYluuOEGVVZWOu9fc7vaRo4cqZEjR0qSbDbb5XcHAAAAAACanUZPA6msrNTo0aM1e/Zs9ejRQ2FhYdq/f7/Ky8tVUFCg3r17Szp3usjhw4dVVlYmh8OhwMBAhYeHKzs7W5KUnZ2t/v3719m2adMmRUREuLk9AAAAAADQ3DS6suKdd97Rrl27lJycrOTkZE2cOFFTpkyR1WpV+/bttXLlSklSSkqKkpKSVFVVpTlz5kiSJkyYoMTERKWmpmrIkCEKDg5WcHCwunbtKovFotDQUE2dOtX9HQIAAAAAgGallWEYhrcnUR+bzab09HRvT+OSdJuR6bXaR+cP9lptAAAAAACaoqG//Rs9DQQAAAAAAMDTCCsAAAAAAICpEFYAAAAAAABTIawAAAAAAACmQlgBAAAAAABMhbACAAAAAACYCmEFAAAAAAAwFcIKAAAAAABgKoQVAAAAAADAVAgrAAAAAACAqRBWAAAAAAAAUyGsAAAAAAAApnKVtycA1+o2I9NrtY/OH+y12gAAAACAloOVFQAAAAAAwFQIKwAAAAAAgKkQVgAAAAAAAFMhrAAAAAAAAKZCWAEAAAAAAEyFsAIAAAAAAJgKYQUAAAAAADAVwgoAAAAAAGAqhBUAAAAAAMBUCCsAAAAAAICpEFYAAAAAAABTaTSsKCkpUb9+/dSpUyd9/vnnkqSwsDBZrVZZrVZt2bJFknTgwAFFRkYqPDxcW7dulSSdPn1aw4cP14ABA7RgwQLnmNOnT5fFYtG4ceNUUVHhrr4AAAAAAEAz1WhY0aFDB2VmZmrEiBHObX5+frLb7bLb7YqNjZUkzZw5U8uXL1dWVpZmzZolSVq2bJkSEhKUl5ennJwcFRUVad++fSoqKlJubq569uypdevWubE1AAAAAADQHDUaVrRt21ZdunSps+3UqVOKiorS2LFj5XA4JEnHjx9XWFiYfH19FRAQoOLiYhUUFCguLk6SFBsbq507d9bZFh8fr/z8/DpjZ2RkyGazyWazqbCw0GVNAgAAAACA5qPJ16zIz8/X9u3bFR8fr9mzZ0uSqqurnd/38/OTw+HQyZMn5evre8FttY0cOVLp6elKT09XSEjIJTcFAAAAAACaryaHFddcc40kacSIEdq3b9+5QVr/OkxJSYkCAgLk7++v0tLSC24DAAAAAACorUlhRXl5uc6ePStJys3NVffu3SVJQUFBOnz4sMrKyuRwOBQYGKjw8HBlZ2dLkrKzs9W/f/862zZt2qSIiAhX9gIAAAAAAFqAqy50g4SEBO3du1cHDx7UsGHDlJ6ero4dO8rHx0epqamSpJSUFCUlJamqqkpz5syRJE2YMEGJiYlKTU3VkCFDFBwcrODgYHXt2lUWi0WhoaGaOnWqe7sDAAAAAADNTivDMAxvT6I+NptN6enp3p7GJek2I9PbU/CKo/MHe3sKAAAAAIBmpKG//Zt8zQoAAAAAAAB3IqwAAAAAAACmQlgBAAAAAABMhbACAAAAAACYCmEFAAAAAAAwFcIKAAAAAABgKoQVAAAAAADAVAgrAAAAAACAqRBWAAAAAAAAUyGsAAAAAAAApkJYAQAAAAAATIWwAgAAAAAAmAphBQAAAAAAMBXCCgAAAAAAYCqEFQAAAAAAwFQIKwAAAAAAgKlc5e0JoOXoNiPTa7WPzh/stdoAAAAAANcirECLQFACAAAAAC0Hp4EAAAAAAABTIawAAAAAAACmQlgBAAAAAABMhbACAAAAAACYChfYBC4TF/cEAAAAANdqdGVFSUmJ+vXrp06dOunzzz+XJGVkZCg8PFwDBw7UsWPHJEkHDhxQZGSkwsPDtXXrVknS6dOnNXz4cA0YMEALFixwjjl9+nRZLBaNGzdOFRUV7uoLAAAAAAA0U42urOjQoYMyMzM1bdo0SVJlZaUWLlyo7du36+OPP1ZycrKWLFmimTNnavny5eratavuvfdeDRw4UMuWLVNCQoImTJig+Ph4PfTQQyouLlZRUZFyc3OVkpKidevWacyYMR5pFGiJWNUBAAAAoCVqdGVF27Zt1aVLF+fXhw4dUq9evdSuXTtFRETo008/lSQdP35cYWFh8vX1VUBAgIqLi1VQUKC4uDhJUmxsrHbu3FlnW3x8vPLz893VFwAAAAAAaKaadM2KkydPytfX1/l1VVWVJKm6utq5zc/PTw6Ho85ta28LCgqqs622jIwMZWRkSJIKCwsvoR0AAAAAANDcNSms8Pf3V2lpqfPrNm3aSJJat/51gUZJSYkCAgKct/X391dJSYluuOEGVVZWOu9fc7vaRo4cqZEjR0qSbDbbpXUEAAAAAACatSZ9dGlYWJj279+v8vJyFRQUqHfv3pKkoKAgHT58WGVlZXI4HAoMDFR4eLiys7MlSdnZ2erfv3+dbZs2bVJERISL2wEAAAAAAM3dBVdWJCQkaO/evTp48KAee+wxTZkyRVarVe3bt9fKlSslSSkpKUpKSlJVVZXmzJkjSZowYYISExOVmpqqIUOGKDg4WMHBweratassFotCQ0M1depU93YHAAAAAACanQuGFRs3bjxv26hRo+p8feuttyo3N7fOtk6dOmnDhg3n3ffFF19s6hwBAAAAAMAVpEmngQAAAAAAALhbky6wCQA1us3I9Frto/MHe602AAAAAPdjZQUAAAAAADAVwgoAAAAAAGAqnAYCoNnhFBQAAACgZSOsAIAmICgBAAAA3I/TQAAAAAAAgKkQVgAAAAAAAFMhrAAAAAAAAKZCWAEAAAAAAEyFsAIAAAAAAJgKYQUAAAAAADAVProUAJoJPjYVAAAAVwpWVgAAAAAAAFNhZQUA4IJY1QEAAABPYmUFAAAAAAAwFcIKAAAAAABgKpwGAgAwNU5BAQAAuPKwsgIAAAAAAJgKYQUAAAAAADAVwgoAAAAAAGAqXLMCAIAGcL0MAAAA72BlBQAAAAAAMBXCCgAAAAAAYCpNDiuOHj2qLl26yGq1ymq16sSJE8rIyFB4eLgGDhyoY8eOSZIOHDigyMhIhYeHa+vWrZKk06dPa/jw4RowYIAWLFjg2k4AAAAAAECLcEkrK6KiomS322W329W5c2ctXLhQdrtdc+fOVXJysiRp5syZWr58ubKysjRr1ixJ0rJly5SQkKC8vDzl5OSoqKjIdZ0AAAAAAIAW4ZLCivz8fFksFs2cOVOHDh1Sr1691K5dO0VEROjTTz+VJB0/flxhYWHy9fVVQECAiouLVVBQoLi4OElSbGysdu7c6bpOAAAAAABAi9DkTwMJCgrS119/rQ4dOuiRRx7R+vXr5evr6/x+VVWVJKm6utq5zc/PTw6HQydPnnTetmZbbRkZGcrIyJAkFRYWNr0bAAAAAADQ7DV5ZYWPj486duyoVq1aafjw4dq3b59KS0ud32/Tps25gVv/OnRJSYkCAgLk7+/vvG3NttpGjhyp9PR0paenKyQk5JIaAgAAAAAAzVuTw4qysjLn/3NzczV48GDt379f5eXlKigoUO/evSWdW4Fx+PBhlZWVyeFwKDAwUOHh4crOzpYkZWdnq3///i5qAwAAAAAAtBRNPg0kLy9Pzz33nDp06KAbb7xRycnJat++vaxWq9q3b6+VK1dKklJSUpSUlKSqqirNmTNHkjRhwgQlJiYqNTVVQ4YMUXBwsGu7AQAAAAAAzV4rwzAMb0+iPjabTenp6d6exiXpNiPT21MAADRzR+cP9vYUAAAA3K6hv/0v6dNAAAAAAAAA3KXJp4EAAAD38+YqPVZ1AAAAbyOsAAAAdRCUAAAAb+M0EAAAAAAAYCqEFQAAAAAAwFQ4DQQAAJjGlfqJWpz+AgBAXYQVAAAAXsZ1QgAAqIuwAgAA4ApGUAIAMCPCCgAAAHgFQQkAoCGEFQAAALjiEJQAgLkRVgAAAAAeRFACABdGWAEAAABcIa7UT9y5UhFOoTlr7e0JAAAAAAAA1MbKCgAAAABoga7UlTSsKGkZCCsAAAAAAC0GIU3LwGkgAAAAAADAVAgrAAAAAACAqRBWAAAAAAAAUyGsAAAAAAAApkJYAQAAAAAATIWwAgAAAAAAmAphBQAAAAAAMBXCCgAAAAAAYCqEFQAAAAAAwFS8ElZMnz5dFotF48aNU0VFhTemAAAAAAAATMrjYcW+fftUVFSk3Nxc9ezZU+vWrfP0FAAAAAAAgIld5emCBQUFiouLkyTFx8drxYoVGjNmjCQpIyNDGRkZkqTdu3fLZrN5enou0c+LtQsLCxUSEkJtalOb2tSmNrWpTW1qU5va1L6Cat999zyv1b4chw8frv8bhoelpKQY7777rmEYhnHo0CFjzJgxnp5CizZy5EhqU5va1KY2talNbWpTm9rUpja1mzWPnwbi7++v0tJSSVJJSYkCAgI8PQUAAAAAAGBibZ5//vnnPVmwbdu2WrVqlR544AGlpaWpd+/euv322z05hRbvtttuoza1qU1talOb2tSmNrWpTW1qU7vZamUYhuHpotOmTdOHH36o0NBQrVixQu3atfP0FAAAAAAAgEl5JawAAAAAAABoiMevWQGgeXr44Ye9PQWvutL7h2e0tP2sOfTjiTl6+3Hwdn38iucC7sY+hpaEsAIus3HjRi1fvtxj9bx5ML6SfhHk5+crOjpaR44cUVRUlP797397dT6FhYX629/+5rF6DfU/evRoVVZWur2+p/tFwzZs2KD//e9/bhm7vv1sypQpOnPmjH7++WdZrVbdc889bqldo+YYPmrUKF3uostLOW789NNPSk9Pv6y6TdHQHJ977jnt27dPkjR+/Hg5HA6X1/CUy63ft29fSVJSUpI+//xzd0zRK1y5r1+s+p4Lq9Uqu90uT10+rqZvq9Wq559/Xna7/bLHbC6vh5YuXer8f82xtaXx9vHmSrZ3714tXry4zraa46er1d6Xa/v+++81e/Zst9T0Km9+FAlalqFDhxpnzpxxe528vDzDarUaUVFRRmRkpLFu3TrDMAyjurraGDRokDFq1CiP1l67dq1x5513Gnl5eW6r6y3FxcVG7969jePHjxuGYRjl5eVGQUGBl2dlGKNHjzZOnjzp9joN9b99+3Zj3rx5bq9fw1P9onHjx483PvvsM5ePe6Gfs4KCAuPJJ590ed3fqjmGL1q0yMjKyrrkcS7UT1VVVb33O3LkiPHggw9edJ2GxrncOQ4ZMsQ5/tChQ91SwxNcUb9Pnz6GYbhv3/cWV+3rF6uh5yIqKsrYtm2bMXv2bLfPwTB+7TsqKsqYPXu2sW3btkseq6HXYrV98sknxq5du+q9/z/+8Q/jm2++ueT6TVWzL7dUF/p5X7FixWU932g6d+1z9Y1bXV1tVFdXu6Wet7GyAi7x008/qaqqSu3bt3drnR9//FGTJk3S6tWrZbfblZ2dreuvv17SuUSxdevWWrNmjUdr22w2zZs3Tx988IFb6nrTxo0b9cADDygoKEjSuU/zufvuu708K8lisWjTpk1ur9NQ/xs2bFBsbKzb69eo6belvht0IY899phbx6+srNSIESN0zz33aPLkyUpKSlJWVpYsFovCw8P1zjvv6MiRI8rKytLDDz+sZ5991qX1G9rPrFarTp06paeeekrr16/XpEmTXFq3ttrH8NjYWG3YsEGSlJaWpp07dzZprMb6efbZZzVo0CD98ssvSkxMVExMjIYOHarS0lItXrxY27dvl9Vq1ZdfflnnXama/z///PNKSkpSQkKCPv30U0VERGjUqFG6/fbblZOTc9lz/OGHH9S1a1dJ0u7du9WnT58m9X6xj8PTTz+tyMhIPfHEE5JU7+OxdOlSrV69WmfOnJGPj4++++47bd++/aLfOWtK/bS0NL322muSpPfff99j7/TXVl5eroqKCrfXaWhfd6eGnos333xT/fr1cz4P7lS77zfffFNPPPGE+vXrd0ljNfZarLa9e/fqo48+Om97dXW1ZsyYoRtvvLFJdQ3D0OTJk2WxWBQdHa1du3Zp0KBBslqtzhWIaWlpGjZsmBISEmSxWFRUVKTFixfr4MGDslqtysnJcR5bS0tLNXToUEVFRWn06NEqLy+X3W5XfHy8HnjgAd1xxx2XtaKooqJC5eXll3z/pjDD6zXDMPTXv/5V0dHRuueee5SXl6eEhAQZhqFZs2ZpxYoVHpuLu183/JbdbtfUqVP11ltvqW/fvhozZoxOnTrlkrE//PBD/fnPf1Z0dLTuuOMO5768evVqJSUlafLkyYqLi9Pu3bs1YsQISar3OP/TTz8pLi5O8fHxSkpK8spx/lIQVsAlvvrqK3Xr1s3tdRo7GJ89e1YdO3b0Su0OHTrol19+cVttbzl+/Liz35pf8EOHDtWLL74oq9Va59/8+fM9Nq+bbrpJX375pdvrNNT/gQMHdNNNN7m9fo2afhctWqSrr77aY3XNYsmSJW4df8OGDbrllluUnZ2tO+64Q4ZhKDk5WVu3blVubq5ee+01hYaGKj4+XitWrNCCBQtcWr+h/azGggULNGrUKL3++usurVtb7WN47Z+vpKSkJr/gbayfQYMGacuWLVq2bJliYmKUk5Ojhx56SEuXLtXEiRMVFRUlu92uW2+9tcHxQ0JCtHHjRvn7+6u4uFirVq1Senq684/ty5njpk2bNGjQIElSVlaW7r333ib1fjE1JGnYsGHasWOH9uzZo5KSknofD4vFotzcXO3atUsxMTHKzc1Vbm6uIiMjXV7fm7744gs9/fTTiomJ8chcGtrX3amh5yI0NFQdOnRQYGCg2+dQu+/Q0FAFBgaqQ4cOlzRWQ6+HHn74YVksFlmtVh09elSLFy/WK6+8ori4OB09elSRkZEaNWqUXnjhBeepRQ2FAytXrlTfvn01fvx45/Hgv//9r1q3bq3c3Fxt27ZNL730kl5//XXZ7Xb98ssv2r17t6Rzr8k2btyo//u//9MLL7ygiRMnqkePHrLb7YqJiXH2sXTpUiUkJGj79u267bbbnG92VVRU6N1339X8+fOVmpp6SY+RJJWUlCgmJkZPP/20vvjii0se52Jc6PeIJ2RmZqpz587atm2bUlJStGbNGkVFRemxxx7T559/7tFThtz9uqE+VVVVWrhwofLz8/Xqq6/q2LFjLhk3MzNTs2fP1rZt2/TJJ5849+WxY8dKkv70pz9py5Yt6tKlS5371fd7ZsSIEcrKyqo3XDSrq7w9AaApfnswnjt3rnx9ffWf//xHhYWF8vPz80ptPz8/lx2UzOT666/XoUOHJEkxMTGKiYlR3759NW3aNE2bNs3Ls3O/hvq/9tprvTwzuNLXX3/tfAe9T58+2rBhg7766ivFxcVJOvduxIkTJ9xWv6H9rFOnTm6r6U6N9XPXXXdJkr788kt9/PHHevPNN1VRUSGLxdLomEat6wrUjCFJf/jDH3TVVVcpJCREJ0+evOw5btmyRa+++qqkcysrnnvuuYse82JrdOrUSXfeeack6fe//71++umneh+Pnj17av/+/dqxY4dmzpyp1atXq7CwUE8//bTL67dq1cp5P8MD13CoqKhQWlqa1q1bpxtuuEEPP/ywFi5c6Pa63tLQc9Fc1fd66Oqrr1ZJSYny8/PVqlUrVVdXa+LEiTp16pSeeOIJHT16VEVFRcrOzla7du2UlJTkHK+iokJZWVn64IMPlJqaqhdffFEvv/yydu3apdOnT+uGG26QJO3fv19RUVHO+x04cEB/+ctfJEllZWXOoLHmeH7XXXfplVdeabCPr7/+Wo888ojztvn5+QoNDdUf//hHSWryceW3AgMDlZeXp4KCAr3yyiv69ttvNWLECCUlJalt27aXPG59GtrH5s6dq5ycHH3//fdq3769/P39NXbsWD366KMurS+dO66/++672rFjhwzDUEhIiJKTkxUUFKTs7GyX1zObsrIyBQcHy8fHRz4+Pk1eOdSQyZMna968eVq1apUeeuih875f+3dibb89ztfe3/v06aPPPvvMJfNzN1ZWwCVuueUWHT161O11rr/+ehUVFUk6dzC22+06fvy4MjMz9cADDygxMdHjtSU5l2WNHj3abfXr4+6AJCEhQe+++66zz5oLSnp7ZcU333yjXr16ub1OQ/336NFD33zzjdvr16jpt6ioyGMXgzMTd+/n3bt31yeffCJJ+uSTTxQYGKiePXtq8+bNstvt2rt3r6677jq1bdtWVVVVLq/f0H7mSbWP4bV/vhwOh37++ecmjdVYP61bn3vZ0bNnTz355JOy2+3Kz89XcnLyeY9vmzZtVFZWprKysjo/bzVjSLrkP7Lrm2N1dbVKS0vl7+8vh8Ohzp0716nVVI09Dr+dd32PR6tWrRQQEKD8/HxZLBZ9//33Onv27EW/G96U+p07d3b+nNVcXNSdysrK9MYbbyg0NFQTJ0706HL1hvZ1dzLbz/jlqu/10IkTJzR58mSNGzdOTz31VL3HjTvuuEPt2rU7b/tvw4ETJ04oJCREPj4+CggIcK4I6dWrl3bs2OG8X48ePbRy5UrZ7Xbt3r1bQ4YMkSTn8Xz37t3q3r27pLr7fI3u3bs7T1P5+OOPFRYWdt5tXfE7Nzw8XBMnTlRoaKjeeOMNlZWVXfaYv9XQPjZr1izZ7XbNmDFDixYtkt1ud0tQIZ07rttsNtntdm3fvl0rVqzQs88+q5dffllz5851y+9PM/nd736nY8eOqby8XA6HQ0eOHHHJuH5+fsm0oqkAAAR7SURBVHrttde0YsUKTZ8+/bx9uaHfU7/dj3/7Wqe5IKxoQbx5FVh/f3+1bt3a7adCNHQwHjx4sNavX6+3337b47Wlc+dlhoWFue16GfWprKzUmDFj3Frjmmuu0RtvvKGxY8cqOjpagwYN0pQpUzRt2jTZ7fY6/2bMmOHWudSWm5vrfAfFnRrqf9iwYdqyZYvb69eo6TcxMdFj57+ahSf282HDhunAgQMaOHCgdu3aJR8fHz333HOKjY1VdHS0852Me++9V1OmTFFKSopL6ze0n3lS7WP4li1bdP/990uSFi5c2ORrVlxMP48++qi2bNnifAdw8+bNCgoK0pkzZzRixAgdOnTIeW76M8884/Ilq/XNcfjw4c53qDZv3nzZ16VpyvNa3+MhSQMGDHCe3njdddc53ylzdf177rlHBQUFSkhI0LffftvETpsuICBAe/bs0aRJk7R8+XJFR0dr0aJFOnv2rNtrN7Svu5PZfsYvV32vh6qqqmSz2fT222+ra9euWr9+/XkB5MX+UdWlSxfnH30nT550hiz33XefKisrNWDAAEVHR2vq1Kl6/PHHFR0drdjYWOd8ysvLFR8fr+TkZOc1hnr06KEHH3xQ+fn5zlqPPPKIMjMzFRUVpc8++8zlbzidPXtWixYtUnR0tJYvX65JkyZpz549CggIcGkdyRz72H333acff/xR0dHRiomJ0dKlS9W2bVtNnDhRw4YN04svvuixuXj6mhXSuYB9ypQpCg8P15NPPqnQ0FCXjLtkyRJFRkbKarUqKSlJ0dHRuv/++5t8vZ0JEyZo7dq1GjRokI4cOeLy1T1u442reqJlev/9941ly5a5vU5eXp4RFRVlWK1WIyYmxnjrrbcMw2j6leRdWXvbtm3GM88849bav7Vr1y5j6dKlHq1pBt99953x1FNPeXsahs1mMyoqKtxep6bfyspK47HHHnN7PbPx1H5eXl5uGIZhLFmyxJg/f77b65lRzTHcZrM5P2nj8ccf98h+bgYffvih8fXXXxuGYRhZWVlGcXGxl2d05Thz5oyxatUq48cff/RIvfr29SuBK1+n1fd6KDIy0rmtqKjIOHz4sBEREWGMGTPmvNdoNZ8wU/v102effWaMHz/eMAzDSEtLM/r06WMkJiYat95660XPa8WKFcarr77qkh4v148//misWrXKI5+UB1xIVVWVUVlZaRiGYfz973831qxZ4+UZXZxWhnEFrilGi/TDDz/ooYce8sp5ce+995727NmjuXPnerw24C4lJSXnvev40ksvXdYnJJhVQkKCTp06JR8fH61du9Yt73wBAC5ORUWF2rZtK4fDofj4+Ho/VaQ+aWlpzutkAPjV6dOnFR8fL8MwdO2112rNmjX1npZlNoQVaFHGjx+vX375RWvXrvVYzfT0dP3rX//SK6+84jzvEgAAAJfmn//8p9avX6+ysjLNmzfvsj6ZB0DzRVgBAAAAAABMhQtsAgAAAAAAUyGsAAAAAAAApkJYAQAAAAAATIWwAgAAAAAAmMr/A1fKdro4K+wkAAAAAElFTkSuQmCC
" />
</div>
</div>
<div class="output_area">
<div class="output_png output_subarea ">
<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAABCUAAADQCAYAAAAwGNsrAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAAJOgAACToB8GSSSgAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3de1zUdb7H8TdXNT2AqLmotLbJ0bQwj5dVFJhQUBE9eQG8YdiW6Vqba26U69GU1dRt3babXRTxpKWMGbreQRwuYpqWprWaudnBW5tBoJZxm/OHD2YhuTPjT/T1/At+M/P9fL/f+f1+85v3/H4zTlar1SoAAAAAAIAbzNnoDgAAAAAAgNsToQQAAAAAADAEoQQAAAAAADAEoQQAAAAAADCEq9EdkKSePXvqnnvuMbobAAAAAADAQU6dOqVDhw5VWHZThBL33HOPkpKSjO4GAAAAAABwkKioqOuWcfkGAAAAAAAwRLWhxDfffKOAgAAFBwcrJCRE58+fl8lkUmBgoEwmk9555x1J0oULFxQWFqb+/ftrzZo1kqSSkhI98sgjCgwM1IwZMxw/EgAAAAAA0KhUG0q0bt1aWVlZSk9P16RJk7Ry5UpJ0vbt22WxWBQTEyNJWrJkiZ555hmlp6frtdde09WrV7Vlyxa1a9dOmZmZunLlivbt2+f40QAAAAAAgEaj2lDCxcVFzs7X7nLp0iV169ZNzs7OCg8P14gRI/T1119Lkg4cOKCQkBC5urqqV69eOnbsmLKzsxUWFiZJGjJkiPbu3VuhbbPZrKioKEVFRSknJ8cRYwMAAAAAADexGr/o8vDhw3r88cf1/fffa9euXTKbzWrVqpXS09P15JNPavPmzSoqKrKFF56ensrNzVVeXp48PDwqLCsvMjJSkZGRkir/sgsAAAAAAHBrq/GLLh944AHt379f8fHxeuGFF9SqVStJUnBwsM6dOydJcnNzU2lpqSQpPz9f3t7e8vLyUkFBQYVlAAAAAAAAZaoNJQoLC21/e3p66o477rAFDZ9//rlatmwpSerdu7csFouKi4t16NAhdevWTQEBAUpNTZUk7dy5U/3793fUGAAAAAAAQCNU7eUbhw8f1qxZs+Ti4qKmTZsqISFBISEhatasmSTptddekyTFxcVp0qRJmjNnjqZOnapmzZopIiJCycnJCgwMVI8ePdSvXz/Hj+YG6/jsVsNqn148zLDaAAAAAADYQ7WhRJ8+fZSRkVFh2cGDB6+7n4+Pj1JSUio27OqqxMTEhvcQAAAAAADckmr8TgkAAAAAAABHIJQAAAAAAACGIJQAAAAAAACGIJQAAAAAAACGIJQAAAAAAACGIJQAAAAAAACGIJQAAAAAAACGIJQAAAAAAACGIJQAAAAAAACGIJQAAAAAAACGIJQAAAAAAACGIJQAAAAAAACGIJQAAAAAAACGIJQAAAAAAACGIJQAAAAAAACGIJQAAAAAAACGIJQAAAAAAACGqDaU+OabbxQQEKDg4GCFhITo/PnzysrKUkBAgAYMGKCjR49Kki5cuKCwsDD1799fa9askSSVlJTokUceUWBgoGbMmOH4kQAAAAAAgEal2lCidevWysrKUnp6uiZNmqSVK1fqj3/8o7Zu3ap3331XcXFxkqQlS5bomWeeUXp6ul577TVdvXpVW7ZsUbt27ZSZmakrV65o3759N2RAAAAAAACgcXCt7kYXFxfb35cuXdI999yjtLQ0tWzZUi1btlRubq4k6cCBA/rLX/4iZ2dn9erVS8eOHVN2draGDRsmSRoyZIj27t2rfv362dozm80ym82SpJycHLsP7FbX8dmthtU+vXiYYbUBAAAAALeOGr9T4vDhw/r1r3+tV199VQEBAfLw8LDd5urqqsLCQhUVFcnZ+VpTnp6eys3NVV5enu2+ZcvKi4yMVFJSkpKSkuTr62vPMQEAAAAAgEag2jMlJOmBBx7Q/v37lZSUpIULF6qgoMB2W3Fxsdzd3eXm5qbS0lI5OzsrPz9f3t7e8vLyst23bBkAAAAAAECZas+UKCwstP3t6empFi1aqLi4WN9//71ycnJsQUPv3r1lsVhUXFysQ4cOqVu3bgoICFBqaqokaefOnerfv78DhwEAAAAAABqbas+UOHz4sGbNmiUXFxc1bdpUCQkJOnnypMLDw+Xk5KTXX39dkhQXF6dJkyZpzpw5mjp1qpo1a6aIiAglJycrMDBQPXr0qPB9EgAAAAAAANWGEn369FFGRkaFZT4+PsrOzr5uWUpKSsWGXV2VmJhon14CAAAAAIBbTo1fdAkAAAAAAOAIhBIAAAAAAMAQhBIAAAAAAMAQhBIAAAAAAMAQhBIAAAAAAMAQhBIAAAAAAMAQhBIAAAAAAMAQhBIAAAAAAMAQhBIAAAAAAMAQhBIAAAAAAMAQhBIAAAAAAMAQhBIAAAAAAMAQhBIAAAAAAMAQhBIAAAAAAMAQhBIAAAAAAMAQhBIAAAAAAMAQhBIAAAAAAMAQ1YYSBw4cUL9+/RQUFKRx48apqKhIfn5+MplMMplMSklJkSQdP35cQUFBCggI0O7duyVJV65c0ahRozRgwAAtXbrU8SMBAAAAAACNSrWhhK+vr9LS0pSRkaGOHTtq06ZN8vT0lMVikcViUWhoqCRp9uzZWrlypXbs2KG5c+dKklasWKHw8HBlZWUpLS1NZ8+edfxoAAAAAABAo1FtKOHj46NmzZpJktzd3eXs7KzLly8rODhY48ePV25uriTp3Llz8vPzk4eHh7y9vXXx4kVlZ2crLCxMkhQaGqp9+/Y5eCgAAAAAAKAxqdV3Snz99dfatWuXhg8frr179yo9PV1DhgzRvHnzJEmlpaW2+3p6eio3N1d5eXny8PCosKw8s9msqKgoRUVFKScnx17jAQAAAAAAjUSNoURBQYFiYmKUmJgoNzc3tWrVSpI0ZswYHTly5Fojzv9uJj8/X97e3vLy8lJBQUGFZeVFRkYqKSlJSUlJ8vX1tduAAAAAAABA41BtKFFcXKyxY8dq3rx56ty5swoLC/XTTz9JkjIzM9WpUydJ1y7zOHXqlC5duqTc3Fy1bt1aAQEBSk1NlSSlpqaqb9++Dh4KAAAAAABoTFyru/G9997T/v37FR8fr/j4eE2bNk1Lly5V8+bN1aRJEyUkJEiSFi5cqNjYWJWUlGj+/PmSpEcffVQTJ05UQkKCIiIi1KFDB8ePBgAAAAAANBrVhhIxMTGKiYmpsCw6Ovq6+3Xt2lWZmZkVlrVo0ULJycl26CIAAAAAALgV1eqLLgEAAAAAAOyNUAIAAAAAABiCUAIAAAAAABiCUAIAAAAAABiCUAIAAAAAABiCUAIAAAAAABiCUAIAAAAAABiCUAIAAAAAABiCUAIAAAAAABiCUAIAAAAAABiCUAIAAAAAABiCUAIAAAAAABiCUAIAAAAAABiCUAIAAAAAABiCUAIAAAAAABiCUAIAAAAAABiCUAIAAAAAABii2lDiwIED6tevn4KCgjRu3DgVFRXJbDYrICBAAwcO1JkzZyRJx48fV1BQkAICArR7925J0pUrVzRq1CgNGDBAS5cudfxIAAAAAABAo1JtKOHr66u0tDRlZGSoY8eO2rRpk5YtWyaLxaIFCxYoPj5ekjR79mytXLlSO3bs0Ny5cyVJK1asUHh4uLKyspSWlqazZ886fjQAAAAAAKDRqDaU8PHxUbNmzSRJ7u7uOnHihO699165u7urf//++vTTTyVJ586dk5+fnzw8POTt7a2LFy8qOztbYWFhkqTQ0FDt27fPwUMBAAAAAACNiWtt7vT1119r165dWrx4sb799lvb8pKSEklSaWmpbZmnp6dyc3OVl5cnDw+PCsvKM5vNMpvNkqScnJyGjQIAAAAAADQ6NYYSBQUFiomJUWJiokpKSlRQUGC7zcXFRZLk7PzvEy7y8/Pl7e0tLy8vFRQUyMvLS/n5+frlL39Zod3IyEhFRkZKkqKiouwyGAAAAAAA0HhUe/lGcXGxxo4dq3nz5qlz587y8/PTP/7xDxUWFio7O1v+/v6Srl3mcerUKV26dEm5ublq3bq1AgIClJqaKklKTU1V3759HT8aAAAAAADQaFR7psR7772n/fv3Kz4+XvHx8Zo2bZpmzJghk8mkpk2bavXq1ZKkhQsXKjY2ViUlJZo/f74k6dFHH9XEiROVkJCgiIgIdejQwfGjAQAAAAAAjUa1oURMTIxiYmKuWx4dHV3h/65duyozM7PCshYtWig5OdkOXQQAAAAAALeiai/fAAAAAAAAcBRCCQAAAAAAYAhCCQAAAAAAYAhCCQAAAAAAYAhCCQAAAAAAYAhCCQAAAAAAYAhCCQAAAAAAYAhCCQAAAAAAYAhCCQAAAAAAYAhCCQAAAAAAYAhCCQAAAAAAYAhCCQAAAAAAYAhCCQAAAAAAYAhCCQAAAAAAYAhCCQAAAAAAYAhCCQAAAAAAYAhCCQAAAAAAYIhqQ4n8/Hz16dNHLVq00LFjxyRJfn5+MplMMplMSklJkSQdP35cQUFBCggI0O7duyVJV65c0ahRozRgwAAtXbrUwcMAAAAAAACNTbWhxB133KGtW7dqzJgxtmWenp6yWCyyWCwKDQ2VJM2ePVsrV67Ujh07NHfuXEnSihUrFB4erqysLKWlpens2bMOHAYAAAAAAGhsqg0l3Nzc1KZNmwrLLl++rODgYI0fP165ubmSpHPnzsnPz08eHh7y9vbWxYsXlZ2drbCwMElSaGio9u3b56AhAAAAAACAxqjO3ymxd+9epaena8iQIZo3b54kqbS01Ha7p6encnNzlZeXJw8PjwrLyjObzYqKilJUVJRycnIaMgYAAAAAANAI1TmUaNWqlSRpzJgxOnLkyLVGnP/dTH5+vry9veXl5aWCgoIKy8qLjIxUUlKSkpKS5OvrW+8BAAAAAACAxsm1LncuLCyU1WpVkyZNlJmZqU6dOkmSfHx8dOrUKd15553Kzc1V69atFRAQoNTUVD3yyCNKTU3V22+/7ZAB4Mbr+OxWw2qfXjzMsNoAAAAAAPuqMZQIDw/X4cOHdeLECT300ENKSkpS8+bN1aRJEyUkJEiSFi5cqNjYWJWUlGj+/PmSpEcffVQTJ05UQkKCIiIi1KFDB8eOBAAAAAAANCo1hhLbtm2r8H9cXNx19+natasyMzMrLGvRooWSk5Mb2D0AAAAAAHCrqvN3SgAAAAAAANgDoQQAAAAAADAEoQQAAAAAADAEoQQAAAAAADAEoQQAAAAAADAEoQQAAAAAADAEoQQAAAAAADAEoQQAAAAAADAEoQQAAAAAADAEoQQAAAAAADAEoQQAAAAAADAEoQQAAAAAADAEoQQAAAAAADAEoQQAAAAAADAEoQQAAAAAADAEoQQAAAAAADCEq9EdAOqi47NbDat9evEww2oDAAAAwK2o2jMl8vPz1adPH7Vo0ULHjh2TJJnNZgUEBGjgwIE6c+aMJOn48eMKCgpSQECAdu/eLUm6cuWKRo0apQEDBmjp0qUOHgYAAAAAAGhsqg0l7rjjDm3dulVjxoyRJBUXF2vZsmWyWCxasGCB4uPjJUmzZ8/WypUrtWPHDs2dO1eStGLFCoWHhysrK0tpaWk6e/asg4cCAAAAAAAak2pDCTc3N7Vp08b2/8mTJ3XvvffK3d1d/fv316effipJOnfunPz8/OTh4SFvb29dvHhR2dnZCgsLkySFhoZq3759DhwGAAAAAABobOr0nRJ5eXny8PCw/V9SUiJJKi0ttS3z9PRUbm5uhfuWLSvPbDbLbDZLknJycurXewAAAAAA0GjVKZTw8vJSQUGB7X8XFxdJkrPzv0+4yM/Pl7e3t+2+Xl5eys/P1y9/+csKbUVGRioyMlKSFBUVVe8BAAAAAACAxqlOPwnq5+enf/zjHyosLFR2drb8/f0lST4+Pjp16pQuXbqk3NxctW7dWgEBAUpNTZUkpaamqm/fvvbvPQAAAAAAaLRqPFMiPDxchw8f1okTJ/T4449rxowZMplMatq0qVavXi1JWrhwoWJjY1VSUqL58+dLkh599FFNnDhRCQkJioiIUIcOHRw7EgAAAAAA0Kg4Wa1Wq9GdiIqKUlJSktHdqLOOz241ugu4TZxePMzoLgAAAABAg1T23r9Ol28AAAAAAADYC6EEAAAAAAAwBKEEAAAAAAAwBKEEAAAAAAAwBKEEAAAAAAAwBKEEAAAAAAAwhKvRHQBQMyN/fpafIwUAAADgKJwpAQAAAAAADEEoAQAAAAAADMHlGwCqxaUjAAAAAByFMyUAAAAAAIAhCCUAAAAAAIAhCCUAAAAAAIAhCCUAAAAAAIAhCCUAAAAAAIAhCCUAAAAAAIAhCCUAAAAAAIAhXI3uAABUpeOzWw2rfXrxMMNqAwAAALeLOocSp0+fVu/evdWtWzdJktlslsVi0V//+lc1a9ZMq1evVocOHXT8+HFNmTJFxcXFio+P18CBA+3eeQBwFCMDkdsVQRAAAMDtp15nSgQHB2vDhg2SpOLiYi1btkzp6en66KOPFB8frzfffFOzZ8/WypUr1bZtWw0dOpRQAgAAAAAAVFCvUGLv3r0KDAxUYGCgYmJidO+998rd3V39+/fXrFmzJEnnzp2Tn5+fJMnb21sXL15U69atbW2YzWaZzWZJUk5OTkPHAQAAAAAAGpk6f9Glj4+PvvzyS2VkZOhf//qXNm7cKA8PD9vtJSUlkqTS0lLbMk9PT+Xm5lZoJzIyUklJSUpKSpKvr299+w8AAAAAABqpOocSTZo0UfPmzeXk5KRRo0bpyJEjKigosN3u4uJyrWHnfzedn58vb29vO3QXAAAAAADcKup8+calS5f0H//xH5KkzMxMDRs2TG+88YYKCwt18OBB+fv7S7p2RsWpU6d05513Kjc3t8KlGwAA/By/tgIAAHD7qXMokZWVpTlz5uiOO+7Q3Xffrfj4eDVt2lQmk0lNmzbV6tWrJUkLFy5UbGysSkpKNH/+fLt3HAAAeyEQAQAAMEadQ4mhQ4dq6NChFZZFR0crOjq6wrKuXbsqMzOzYb0DAAAAAAC3rHr9+gYAALAPztIAAAC3M0IJAABuUwQiAADAaIQSAADghiMQAQAAEqEEAAC4zRgZiBiJMAYAcDMilAAAALgN3K5hzO2KEApAY0EoAQAAANxiuEQKQGNBKAEAAADAbm7Xs3IIY4D6cTa6AwAAAAAA4PbEmRIAAAAA0ECcIQLUD6EEAAAAAKBebtcwxki3WhDE5RsAAAAAAMAQhBIAAAAAAMAQhBIAAAAAAMAQhBIAAAAAAMAQhBIAAAAAAMAQhBIAAAAAAMAQhBIAAAAAAMAQDg0l4uLiFBgYqJiYGBUVFTmyFAAAAAAAaGQcFkocOXJEZ8+eVWZmprp06aINGzY4qhQAAAAAAGiEXB3VcHZ2tsLCwiRJQ4YM0apVqzRu3Djb7WazWWazWZJ08OBBRUVFOaorDtPHwNo5OTny9fWlNrWpTW1qU5va1KY2talNbWrfRrX79fuTYbUb6tSpU9cvtDrIwoULrR988IHVarVaT548aR03bpyjSt2WIiMjqU1talOb2tSmNrWpTW1qU5va1G7UHHb5hpeXlwoKCiRJ+fn58vb2dlQpAAAAAADQCLk8//zzzzuiYTc3N61du1YjR45UYmKi/P39df/99zui1G2rW7du1KY2talNbWpTm9rUpja1qU1tajdaTlar1eqoxv/whz/oww8/1F133aVVq1bJ3d3dUaUAAAAAAEAj49BQAgAAAAAAoCoO+04J2MfkyZON7sJNwah5MHL+b9fn/kaO+2aY49utDzfDeG8WzIVj3M7z6sixN5Z55XX7Gkf35WYaa5mbsU+OdLuNt4wR475RNW/X51QilLhp7d27Vw8++KC++uorBQcH6/3331diYqIKCwslSc8//7y2bNlicC8dr7J5MKpu+fm/0bUrc/jwYfXp00dPP/20w/t0I1Q27hkzZujHH3+8IbXq49ixY4qNjbVbH3bu3KnevXvrz3/+c73atEcfbuSc17RPu3DhgubNm2f3vhitunn/4YcfZDKZNGjQoHq3f7se1NwM25RR7HnM8NZbb9XYdn0cPnxYBw4ckCSdPn1aY8aMqXMbs2bNksViqfS2yvppMpl0+fLlevW3Low6VqlLX8aOHavi4mKHtV+Vxx9/XJJksVj0xRdf2P6eNWtWg/tSXZ/K9qnl1ztHSE5O1r/+9S+HtV+Z2j4HsbGxOnbsmN3nWzLmdcaI7awhNU+fPq1du3Y5vE7Z89zoGfvjH6jMxYsXrf7+/tZz585ZrVartbCw0JqdnW0NDg62Xrp0yWq1Wq3z5s2z/v3vfzeymw5X1TwYVbf8/N/o2pVZtGiRdePGjQ7tz41yI59re9Y6evSo9eGHH7ZbH6ZMmWL9+OOPa3x8aWmptbS0tM51a9MHR2Cf9m81zXt2drb1d7/7Xb3azsrKsppMJmtwcLA1KCjIumHDhkrvt3379ltm31GmLttUSUmJEV10GHtvXz179qyx7fpYtWqV9ZVXXrFarVbrV199ZR09enSd23j66aete/bsuW55Y3nddrSq+pKenm7905/+5LD2a6P8Orhnzx7r008/3eD+1KZP5de7hqps3/Hwww9bjx49Wu/H11VdnoOyvtlzvit7nblZt7OGHis1dNuu7bw3tE5d1sGbGWdK3IS2bdumkSNHysfHR9K1XzKRrn3KMHToUC1btkyStH79eoWHhys4ONj2yeaiRYsUHBysoKAgHT161JgB2Ell89CvXz/9+c9/lslk0n/9138pJSXlhtSVKs7/mTNnNGjQIAUFBemJJ55waO1+/fppz5496tu3r/r27av//d//1eeff64333xTc+fOve5TrcaoqnGXfcq1adMm9enTRw8++KCWL1/ukFqVrVexsbGaOnWqQkND9dBDD8lqtaq4uFhRUVEaNGiQ/vrXv9qtDz/++KM2bdqkKVOmaPPmzdc952X9mT59usLCwnTx4kWHzEPZnCcmJmr06NEaPny4evfurfPnz9u1llTzPq38p6mTJ09WYGCgTCaTTp8+3YCRX+/DDz/Ur3/9az344INy0A9S2dQ070899ZQ2btyo3/72t3Vq97vvvtNvf/tbvfvuu7JYLEpNTVW7du0qve+QIUM0cuTIBo+loco+QbWH2mxTJpNJzzzzjAYPHqyCggKNGDFCwcHBGjt2rAoLC2WxWDR48GCNHDlS3bt31/r16zV48GD16dNH3333nd36am/13b5KS0s1aNAgBQcHKzQ0VAUFBVq+fLlOnDghk8mktLS0atfXmTNnqm/fvnr++ef15JNPqlevXnrppZckSf/85z81ePBgmUwm/f73v5ckLV++XH/7298UFhYmSTp//ryio6N1//33Ky0tTZIq3e8dOXJEvXv3VkREhD799NNaz0G/fv0kSc8995yCgoL01FNPSZKuXr2qiRMnKiQkRCNGjLD9fL09578uryn2VFVfkpOTFRoa6pD2u3fvrmHDhkmSJk2apAULFkiSTCaTJKlXr1768ccflZiYqOeee06TJk2SdO1Mw7JtrSGf8ta0T/35eldXFotFw4cP18iRI/XGG28oMDBQAQEBeu+99/TVV19px44dmjx5sp555hklJibq1VdflSRt2bLF9nrStWtXTZ48WTNnzmzwOmDUsbFUt9cZe6vt8bFkv2OlmvZ/5d8DVPaasnz5cq1fv14mk0m5ubl1qvOrX/1KERERtvsMGjRI+fn51T7P5c+IKX8m744dOyqstzcrQomb0Llz52wrZlpamkwmk1544QU98MAD2r59u2bOnClJ8vPz07Zt29S3b1+lpKTo2LFjOnHihNLT07Vu3TrNmTPHyGE0WGXzMGLECE2fPl0Wi0U7duzQn/70pxtS9+fzv3jxYs2aNUsZGRn68ccflZGR4bDaI0aM0HPPPactW7YoMzNTL7/8su6++27FxsbqhRde0JQpU+xS20hVjbvMhg0blJiYqD179jT4jUxd16uAgAClpKSoSZMmOnr0qJKTk9WpUyelpqaqd+/eduvDSy+9pCFDhmjVqlWVPudlwWPZC1GbNm0cMg/leXp66u9//7seeeQRmc1mu9aqzT6tTFFRkU6cOKGMjAxZLBbddddd9e5LZbZu3ap58+Zpz549mjt3rl3b/rma5n3p0qWKjo7W66+/Xqd2qzp42rVrl3r06KHIyEgFBQXp9OnTtgPnDRs2aMmSJZKky5cvKyQkRJKUmJhoO4Ape7NY2UFYQ7355pt2aUeq3TYlSYMHD1ZKSoreeusthYeHKz09Xd26ddO6deskSaWlpfrggw80ffp0rVu3Tjt37tSECRO0adMmu/XV3uq7fTk7O2vz5s1KT09XeHi41q9fr2nTpqlz586yWCwKCQmpdn0dPXq0srOztWLFCv3mN7/Rhx9+qHfeeUeS9Oyzz+r111+XxWLR1atXdfDgQU2bNk1PPfWU7XTmixcvau3atUpKSrK9katsvzdnzhytWbNGmzdvrvJSjOr6OXz4cGVkZOibb77Rxx9/rBUrVigkJERpaWmaMGFCg4P9hr6m2FNVfTl+/Lh+9atfOaT9sWPH6urVqyopKdFPP/2kzz77TGfOnKmwn27WrJntmKXsjWNRUZE++OADLV68WAkJCXbtU/l96s/Xu/rIz8/Xxo0btXbtWu3evVuZmZl69dVXddddd9n2MUuXLq3y8WfOnNGyZctsoV1D1gGjjo2luoV/o0aN0pkzZyRduyTs7bffblDt2h4f2/NYqbp166GHHlJGRoYOHTqk/Pz8Sl9Tpk2bpujoaFksFnl7e9epzmOPPaaSkhJ99913OnPmjDw8POTp6Vnn59lqtSo+Pr7CeltSUlLvOXEkQombULt27XT27FlJUkhIiCwWi86dO3fd/Xr06CFJ8vX1VV5enj7//HNlZ2fLZDJp/PjxN+Q6Skeqah7eeecdBQUFKSoqqkGf3ta1bnlffvml7Q1p7969dfLkSYfWLikpUevWreXm5qZOnTpVuj40ZjXN+f/8z//opZdeUkxMTIOvDa3revXz7ezLL79Uz549JaneoURt1rGqnh6/FJQAAAgaSURBVPP61qxPH34+dkfWqq6em5ubpk+frpiYGD311FP64Ycf6t2XykyfPl3btm3ThAkTtGPHDru2/XO1nYu6qurgae7cudq9e7fWrFmjnJycCo8ZNmyYtm3bJknavHmzRowYoe+++07r1q1TRkaGUlJSbJ96StcfhN1MajuvZdtPVftwf39/W3tlf7dv375B6395y5Ytk8lksut3XNR3+7p8+bIee+wxBQcHKyEhodLHVNe2v7+/nJ2d9Ytf/ELdu3eXq6ur7SyN48eP6ze/+Y1MJpMOHDhge3NS3n333SdXV9cK23tl+70LFy6oc+fOcnZ2tu1769LP8vvrkydP6vPPP9fy5ctlMpn08ssvN/iss4a+ptiTo/YvNbXfo0cPbdq0SR07dpSLi4vS0tIUGBhYbVsPPPCApBv3+tIQvXr10rfffqsvvvhCYWFhGjhwoL7//nt9++23Fe7n5ORk+7v8GRCdOnVSy5Ytbf83ZB0w6thYqlv4N3HiRL377ruSpPfff1+RkZENql3X42N7HCtVt26VPYft27fX999/36D3BVXVGT16tN5//32ZzWZFRUVJUrXPc2XrX23W25sFocRNKDw8XB988IFtxS/7YiI3N7cK6dbPV74uXbooODhYFovFlqI1ZlXNwyuvvKI9e/Zo/fr1dj/1sbq65ee/U6dOtjfHH330kfz8/Bxa29nZWRcvXlRRUZFOnjx5w06XK1PZAaU9VTXuMr6+vnrrrbe0ZMkSzZ492yG1qlqvfr6dderUSZ988okk6eDBg3btQ3lVPefOzvbZbdemD1UdYNmrVk37tDIlJSWKiorSmjVr1LZtW23cuLHefamMp6enXn31Va1atUpxcXF2bfvnajPv9VHdAZu3t7eaNGmi++67r8JjmjVrprvuuktffPGFNmzYoKioKJ06dUqfffaZHnzwQQ0bNqzCwcvPD8JuJrWd17Ltp6p9ePl10F7rf3kzZ86UxWLRH/7wB7u0J9V/+9q5c6fuvvtupaenKzY21jbG8verbl6rmitJ6ty5s1avXi2LxaKDBw8qIiKiVtt7Zfu9tm3b6uTJk7Jarfr444/rNAeSKuyvO3XqpC5duuh3v/udLBaL9u7dq/j4+CpmtnYa+ppiT1X1pXPnzvrnP//psPYDAwO1cOFCBQYGqkePHvrb3/52XShR2/29vfpUVd36cHZ2VuvWrdWlSxft2rVLFotFhw8f1i9+8YsK7bds2dJ2vHTkyJEKjy+vIWM36thYqlv4FxERoe3bt+v//u//5OnpKS8vrwbVruvxsT2OlWq7/ys7Nvz5a0pt172q6owePVobN27U1q1bNXz4cEnVP8+VrX9Vrbc3I1ejO4DrtWrVSm+88YbGjx8vJycnOTs7a8aMGcrLy1NUVJRGjx5d6eP8/f3l5+en4OBgOTs7KzQ0tMFv4Cpz4cIFLV++XPPnz7d72+VVNQ9ZWVkaMGCA+vbtqxYtWtywuuXnPy4uTg8//LAWLVqk++67T0FBQQ6t7ePjo2HDhsnJyUlPPPGEmjVrZpd6tVFcXKxx48YpMzPTYTWqGveKFSskSfPnz9e+fftUWFioJ5980iG1artePfTQQ1q3bp0GDhyo//zP/7RrH1JTU233WbRokUOf85rm/EbUqmmfVubSpUv67//+bzk5OcnJyUlr1661a//efPNNbdy4UcXFxfX+NZXactS8h4eHy2QyaerUqWrXrp3toMbFxUV5eXlq3ry5Pvvss+seFx0drbfeeks//PCD2rVrJ3d3d/n7+2vLli1ycnJSUVGR7b72fjP1+OOP2+0SjtpsU+U99thjmjBhgtatW6e2bdsqLi5O2dnZdunLjVbf7atv375atGiRPvnkE7Vt29Z2un3nzp01evRozZw5U/3796/X+rpkyRJNnTpVV69elYuLixISEtSvXz9NmjRJ+/fv16JFiyp9XGX7vfj4eI0fP1533nlnhU+bazMHK1as0Pbt27VgwQJ1795dPXv2VLdu3TRlyhStWrVKkvT000/bvhOhPow6VqlLX3x9fZWSkqI+ffo4pP0BAwboyJEjGjBggNq0aaMXX3xRXbp0qfDYkJAQxcXFKS0tza7faVPTPrX8elf2yX19ODs7a86cOQoNDZWzs7PatGmjpKQkDR06VDNmzNCgQYP0+9//Xi+++KLCw8PVvn17tW/f3l7DtDFyfavqdUa6Fv4NGjRIBw8elMlkkru7u7p166a4uDhNmDChwbWNOD6uy+t1Za8pV69e1XPPPafIyEi9/fbbVQYzVdVp2bKlmjRpIm9vbzVv3lySNGDAgCqf5/vvv18//PCDQkNDbR9CVLXe3oycrI6K0wDcEg4cOKAjR47oscceM7orAKqxd+9e/fGPf7Qd1EyePFl33nmnnn32Wd199906f/68zGazUlJSdPnyZT3xxBMqKipS+/bttWDBAk2dOlXStdND3377bbm4uOj+++/Xyy+/LJPJpC1btqhFixYaM2aMXnzxRXXs2NHYAQOotejoaK1du1aurnweifqr7HVmxYoV6tmzpz766CN1795dr7zyiqRrZwwMGTJEFy5csF3aBVSFUAIAgFtUUVGR3Nzc9NNPP6l379765JNP5OLiYnS3AAC3uEOHDmnVqlW2L7IFqkNcCgDALSo5OVmvvfaaCgoKNGPGDAIJAIDDJScna/HixXa/3BK3Ls6UAAAAAAAAhuDXNwAAAAAAgCEIJQAAAAAAgCEIJQAAAAAAgCEIJQAAAAAAgCH+H+YrQ46jt0SiAAAAAElFTkSuQmCC
" />
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<details class="description">
<summary class="btn btn-sm" data-open="Hide Code" data-close="Show Code"></summary>
<p><div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="k">def</span> <span class="nf">plot_hist</span><span class="p">(</span><span class="n">lens</span><span class="p">:</span> <span class="n">List</span><span class="p">[</span><span class="nb">int</span><span class="p">],</span> <span class="n">n_bins</span><span class="p">:</span> <span class="n">Optional</span><span class="p">[</span><span class="nb">int</span><span class="p">]</span> <span class="o">=</span> <span class="mi">50</span><span class="p">):</span>
<span class="sd">'''</span>
<span class="sd"> Plot a histogram of the given number of tokens in a column </span>
<span class="sd"> :param lens: the number of tokens in a column</span>
<span class="sd"> :param n_bins: the number of bins to sort the number of tokens into</span>
<span class="sd"> '''</span>
<span class="n">n</span><span class="p">,</span> <span class="n">bins</span><span class="p">,</span> <span class="n">patches</span> <span class="o">=</span> <span class="n">plt</span><span class="o">.</span><span class="n">hist</span><span class="p">(</span><span class="n">lens</span><span class="p">,</span> <span class="n">n_bins</span><span class="p">,</span> <span class="n">facecolor</span><span class="o">=</span><span class="s1">'blue'</span><span class="p">,</span> <span class="n">alpha</span><span class="o">=</span><span class="mf">0.9</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
</pre></div>
</div>
</div>
</div>
</p>
</details>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>Now, let's look at the distribution of method and comment lengths.</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="nb">print</span><span class="p">(</span><span class="n">mean</span><span class="p">(</span><span class="n">mthd_lens</span><span class="p">),</span> <span class="n">median</span><span class="p">(</span><span class="n">mthd_lens</span><span class="p">),</span> <span class="n">stdev</span><span class="p">(</span><span class="n">mthd_lens</span><span class="p">))</span>
<span class="n">plot_hist</span><span class="p">(</span><span class="n">mthd_lens</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="n">mean</span><span class="p">(</span><span class="n">cmt_lens</span><span class="p">),</span> <span class="n">median</span><span class="p">(</span><span class="n">cmt_lens</span><span class="p">),</span> <span class="n">stdev</span><span class="p">(</span><span class="n">cmt_lens</span><span class="p">))</span>
<span class="n">plot_hist</span><span class="p">(</span><span class="n">cmt_lens</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_subarea output_stream output_stdout output_text">
<pre>177 102.0 283.76574846164925
</pre>
</div>
</div>
<div class="output_area">
<div class="output_png output_subarea ">
<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAYQAAAD4CAYAAADsKpHdAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAO2klEQVR4nO3db6yed13H8ffHlg0FwlpXm7ZrbDGNSX3gmCejBGKm065rjJsJIVuMq3OmRkcCamI6eTCFJ2gUZREHFSrDwMbkj2uW6ayVhPiAsVOdW/en9mww1z9bC8OhkhgGXx/c3wM3pYeennP33Ken71dy576u7/W7rvP73b+zfXb9uc9SVUiS9APj7oAkaXEwECRJgIEgSWoGgiQJMBAkSW35uDvw/Vx66aW1YcOGcXdDks4rBw4c+HJVrTrb/RZ1IGzYsIHJyclxd0OSzitJnp3Lfl4ykiQBBoIkqRkIkiTAQJAkNQNBkgQYCJKkZiBIkgADQZLUDARJErDIv6k8X2vXnr5+7NjC9kOSzgeeIUiSAANBktQMBEkSYCBIkpqBIEkCDARJUjMQJEmAgSBJagaCJAkwECRJzUCQJAEGgiSpGQiSJMBAkCQ1A0GSBBgIkqRmIEiSAANBktQMBEkSYCBIkpqBIEkCDARJUjMQJEmAgSBJagaCJAkwECRJzUCQJAEGgiSpnTEQkqxP8tkkTyR5PMnbu74yyb4kh/t9RdeT5I4kU0keTXLF0LF2dPvDSXacu2FJks7WbM4QXgZ+t6o2A1uAW5NsBnYB+6tqE7C/1wGuBTb1aydwJwwCBLgdeANwJXD7dIhIksbvjIFQVcer6l97+b+BJ4F1wHXAXd3sLuD6Xr4O+GgNfB64JMka4BpgX1W9WFVfBfYB20Y6GknSnJ3VPYQkG4DXAw8Bq6vqeG96Hljdy+uA54Z2O9K1meqn/oydSSaTTJ48efJsuidJmodZB0KSVwOfAt5RVV8b3lZVBdQoOlRVu6tqoqomVq1aNYpDSpJmYVaBkOQVDMLgY1X16S6/0JeC6PcTXT8KrB/a/bKuzVSXJC0Cs3nKKMCHgSer6r1Dm/YC008K7QDuG6rf1E8bbQFe6ktLDwJbk6zom8lbuyZJWgSWz6LNm4BfAR5L8kjXfh94D3BvkluAZ4G39rYHgO3AFPB14GaAqnoxybuBh7vdu6rqxZGMQpI0b2cMhKr6FyAzbL76NO0LuHWGY+0B9pxNByVJC8NvKkuSAANBktQMBEkSYCBIkpqBIEkCDARJUjMQJEmAgSBJagaCJAkwECRJzUCQJAEGgiSpGQiSJMBAkCQ1A0GSBBgIkqRmIEiSAANBktQMBEkSYCBIkpqBIEkCDARJUjMQJEmAgSBJagaCJAkwECRJzUCQJAEGgiSpGQiSJMBAkCQ1A0GSBBgIkqRmIEiSAANBktTOGAhJ9iQ5keTgUO0PkhxN8ki/tg9tuy3JVJJDSa4Zqm/r2lSSXaMfiiRpPmZzhvARYNtp6n9WVZf36wGAJJuBG4Cf6H3+MsmyJMuA9wPXApuBG7utJGmRWH6mBlX1uSQbZnm864B7qur/gC8mmQKu7G1TVfUMQJJ7uu0TZ91jSdI5MZ97CG9L8mhfUlrRtXXAc0NtjnRtprokaZGYayDcCfwYcDlwHPjTUXUoyc4kk0kmT548OarDSpLOYE6BUFUvVNU3q+pbwF/xnctCR4H1Q00v69pM9dMde3dVTVTVxKpVq+bSPUnSHMwpEJKsGVr9JWD6CaS9wA1JLk6yEdgEfAF4GNiUZGOSixjceN47925LkkbtjDeVk9wNXAVcmuQIcDtwVZLLgQK+BPwGQFU9nuReBjeLXwZurapv9nHeBjwILAP2VNXjIx+NJGnOUlXj7sOMJiYmanJycs77r117+vqxY3M+pCQtekkOVNXE2e7nN5UlSYCBIElqBoIkCTAQJEnNQJAkAQaCJKkZCJIkwECQJDUDQZIEGAiSpGYgSJIAA0GS1AwESRJgIEiSmoEgSQIMBElSMxAkSYCBIElqBoIkCTAQJEnNQJAkAQaCJKkZCJIkwECQJDUDQZIEGAiSpGYgSJIAA0GS1AwESRJgIEiSmoEgSQIMBElSMxAkSYCBIElqBoIkCZhFICTZk+REkoNDtZVJ9iU53O8rup4kdySZSvJokiuG9tnR7Q8n2XFuhiNJmqvZnCF8BNh2Sm0XsL+qNgH7ex3gWmBTv3YCd8IgQIDbgTcAVwK3T4eIJGlxOGMgVNXngBdPKV8H3NXLdwHXD9U/WgOfBy5Jsga4BthXVS9W1VeBfXxvyEiSxmiu9xBWV9XxXn4eWN3L64Dnhtod6dpM9e+RZGeSySSTJ0+enGP3JElna943lauqgBpBX6aPt7uqJqpqYtWqVaM6rCTpDOYaCC/0pSD6/UTXjwLrh9pd1rWZ6pKkRWKugbAXmH5SaAdw31D9pn7aaAvwUl9aehDYmmRF30ze2jVJ0iKx/EwNktwNXAVcmuQIg6eF3gPcm+QW4Fngrd38AWA7MAV8HbgZoKpeTPJu4OFu966qOvVGtSRpjM4YCFV14wybrj5N2wJuneE4e4A9Z9U7SdKC8ZvKkiTAQJAkNQNBkgQYCJKkZiBIkgADQZLUDARJEmAgSJKagSBJAgwESVIzECRJgIEgSWoGgiQJMBAkSc1AkCQBBoIkqRkIkiTAQJAkNQNBkgQYCJKkZiBIkgADQZLUDARJEmAgSJKagSBJAgwESVIzECRJgIEgSWoGgiQJMBAkSc1AkCQBBoIkqRkIkiTAQJAkNQNBkgTMMxCSfCnJY0keSTLZtZVJ9iU53O8rup4kdySZSvJokitGMQBJ0miM4gzhZ6rq8qqa6PVdwP6q2gTs73WAa4FN/doJ3DmCny1JGpFzccnoOuCuXr4LuH6o/tEa+DxwSZI15+DnS5LmYL6BUMA/JjmQZGfXVlfV8V5+Hljdy+uA54b2PdK175JkZ5LJJJMnT56cZ/ckSbO1fJ77v7mqjib5EWBfkqeGN1ZVJamzOWBV7QZ2A0xMTJzVvpKkuZvXGUJVHe33E8BngCuBF6YvBfX7iW5+FFg/tPtlXZMkLQJzDoQkr0rymullYCtwENgL7OhmO4D7enkvcFM/bbQFeGno0pIkaczmc8loNfCZJNPH+XhV/UOSh4F7k9wCPAu8tds/AGwHpoCvAzfP42fPy9q1p68fO7aw/ZCkxWTOgVBVzwA/eZr6V4CrT1Mv4Na5/jxJ0rnlN5UlSYCBIElqBoIkCTAQJEnNQJAkAQaCJKkZCJIkwECQJDUDQZIEGAiSpGYgSJIAA0GS1AwESRJgIEiSmoEgSQIMBElSMxAkSYCBIElqBoIkCTAQJEnNQJAkAQaCJKkZCJIkwECQJDUDQZIEGAiSpGYgSJIAWD7uDiwma9eevn7s2ML2Q5LGwTMESRJgIEiSmoEgSQIMBElS86byLHizWdKFwDMESRJgIEiS2oJfMkqyDXgfsAz4UFW9Z6H7MCpeSpK0lCzoGUKSZcD7gWuBzcCNSTYvZB8kSae30GcIVwJTVfUMQJJ7gOuAJxa4H+fUTGcOZ8szDUkLaaEDYR3w3ND6EeANww2S7AR29ur/JDk0x591KfDlOe67KCRz3vW8H/scOe4Li+Oe2Y/O5cCL7rHTqtoN7J7vcZJMVtXECLp03rlQx+64LyyOe/QW+imjo8D6ofXLuiZJGrOFDoSHgU1JNia5CLgB2LvAfZAkncaCXjKqqpeTvA14kMFjp3uq6vFz9OPmfdnpPHahjt1xX1gc94ilqs7VsSVJ5xG/qSxJAgwESVJbkoGQZFuSQ0mmkuwad3/mK8n6JJ9N8kSSx5O8vesrk+xLcrjfV3Q9Se7o8T+a5IqhY+3o9oeT7BjXmM5GkmVJ/i3J/b2+MclDPb5P9AMKJLm416d6+4ahY9zW9UNJrhnPSGYvySVJPpnkqSRPJnnjhTDfSX67f8cPJrk7ySuX6nwn2ZPkRJKDQ7WRzXGSn0ryWO9zRzKLbzZV1ZJ6MbhZ/TTwOuAi4N+BzePu1zzHtAa4opdfA/wHgz/98cfArq7vAv6ol7cDfw8E2AI81PWVwDP9vqKXV4x7fLMY/+8AHwfu7/V7gRt6+QPAb/bybwEf6OUbgE/08ub+PbgY2Ni/H8vGPa4zjPku4Nd7+SLgkqU+3wy+uPpF4AeH5vlXl+p8Az8NXAEcHKqNbI6BL3Tb9L7XnrFP4/5QzsGH/EbgwaH124Dbxt2vEY/xPuDngUPAmq6tAQ718geBG4faH+rtNwIfHKp/V7vF+GLwXZX9wM8C9/cv95eB5afON4On197Yy8u7XU79HRhutxhfwGv7X4w5pb6k55vv/CWDlT1/9wPXLOX5BjacEggjmePe9tRQ/bvazfRaipeMTvfnMdaNqS8j16fFrwceAlZX1fHe9Dywupdn+gzOx8/mz4HfA77V6z8M/FdVvdzrw2P49vh6+0vd/nwb90bgJPDXfansQ0lexRKf76o6CvwJ8J/AcQbzd4ClP9/DRjXH63r51Pr3tRQDYclK8mrgU8A7quprw9tq8J8BS+oZ4iS/AJyoqgPj7ssCW87gUsKdVfV64H8ZXD74tiU63ysY/LHLjcBa4FXAtrF2aozGMcdLMRCW5J/HSPIKBmHwsar6dJdfSLKmt68BTnR9ps/gfPts3gT8YpIvAfcwuGz0PuCSJNNfqhwew7fH19tfC3yF82/cR4AjVfVQr3+SQUAs9fn+OeCLVXWyqr4BfJrB78BSn+9ho5rjo718av37WoqBsOT+PEY/HfBh4Mmqeu/Qpr3A9FMFOxjcW5iu39RPJmwBXurT0AeBrUlW9H+Nbe3aolRVt1XVZVW1gcE8/nNV/TLwWeAt3ezUcU9/Hm/p9tX1G/qplI3AJgY33BalqnoeeC7Jj3fpagZ/In5JzzeDS0VbkvxQ/85Pj3tJz/cpRjLHve1rSbb0Z3nT0LFmNu6bKufoRs12Bk/iPA28c9z9GcF43szg1PFR4JF+bWdwvXQ/cBj4J2Bltw+D/xHR08BjwMTQsX4NmOrXzeMe21l8BlfxnaeMXsfgH/Ap4G+Bi7v+yl6f6u2vG9r/nf15HGIWT1uM+wVcDkz2nP8dgydIlvx8A38IPAUcBP6GwZNCS3K+gbsZ3Cv5BoOzwltGOcfARH+OTwN/wSkPKZzu5Z+ukCQBS/OSkSRpDgwESRJgIEiSmoEgSQIMBElSMxAkSYCBIElq/w+NoUIDZ56FBwAAAABJRU5ErkJggg==
" />
</div>
</div>
<div class="output_area">
<div class="output_subarea output_stream output_stdout output_text">
<pre>17 12.0 19.77371993328519
</pre>
</div>
</div>
<div class="output_area">
<div class="output_png output_subarea ">
<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAX0AAAD4CAYAAAAAczaOAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAO6klEQVR4nO3cf6zddX3H8edrVMEfGYVy09AfWTE0M2SZym6whmUxdHPAjOUPdBIjjWnSf9iGw0TLloxs+0eTRZRkIWsssyTGH0MTGmM0XcEs+0PkVhkCHePKxLYUetVStxmnne/9cT7FY7mXtvfce+6ln+cjOTmf7+fzOd/v53zSvs73fr7fc1JVSJL68GtLPQBJ0vgY+pLUEUNfkjpi6EtSRwx9SerIiqUewMu55JJLasOGDUs9DEl6Rdm/f/8PqmpitrZlHfobNmxgampqqYchSa8oSZ6Zq83lHUnqiKEvSR0x9CWpI4a+JHXE0Jekjhj6ktQRQ1+SOmLoS1JHDH1J6siy/kbuqNasmb3+2WfHOw5JWi4805ekjhj6ktQRQ1+SOmLoS1JHDH1J6oihL0kdMfQlqSPn9H36c/H+fUm98kxfkjpi6EtSRwx9SeqIoS9JHTH0Jakjpw39JPckOZrksaG6i5PsTfJUe76o1SfJXUmmkzya5Mqh12xt/Z9KsnVx3o4k6eWcyZn+p4FrT6nbAeyrqo3AvrYNcB2wsT22A3fD4EMCuAN4K3AVcMfJDwpJ0vicNvSr6l+AH51SvQXY3cq7gRuG6u+tgW8AK5NcCvwhsLeqflRVx4C9vPSDRJK0yOa7pr+6qo608nPA6lZeCxwc6neo1c1VL0kao5Ev5FZVAbUAYwEgyfYkU0mmZmZmFmq3kiTmH/rPt2Ub2vPRVn8YWD/Ub12rm6v+JapqZ1VNVtXkxMTEPIcnSZrNfEN/D3DyDpytwP1D9Te3u3g2AcfbMtDXgHckuahdwH1Hq5MkjdFpf3AtyWeBtwOXJDnE4C6cjwJfSLINeAZ4T+v+FeB6YBr4CfABgKr6UZK/BR5u/f6mqk69OCxJWmSnDf2qummOps2z9C3gljn2cw9wz1mNTpK0oPxGriR1xNCXpI4Y+pLUEUNfkjpi6EtSRwx9SeqIoS9JHTH0Jakjhr4kdcTQl6SOGPqS1BFDX5I6YuhLUkcMfUnqiKEvSR0x9CWpI4a+JHXE0Jekjhj6ktQRQ1+SOmLoS1JHDH1J6oihL0kdMfQlqSOGviR1xNCXpI4Y+pLUEUNfkjpi6EtSR0YK/SR/nuTxJI8l+WySC5JcluShJNNJPp/k1a3v+W17urVvWIg3IEk6c/MO/SRrgT8DJqvqt4DzgPcCHwPurKrLgWPAtvaSbcCxVn9n6ydJGqNRl3dWAK9JsgJ4LXAEuAa4r7XvBm5o5S1tm9a+OUlGPL4k6SzMO/Sr6jDwd8D3GYT9cWA/8EJVnWjdDgFrW3ktcLC99kTrv+rU/SbZnmQqydTMzMx8hydJmsUoyzsXMTh7vwxYA7wOuHbUAVXVzqqarKrJiYmJUXcnSRoyyvLO7wP/WVUzVfVz4EvA1cDKttwDsA443MqHgfUArf1C4IcjHF+SdJZGCf3vA5uSvLatzW8GngAeBG5sfbYC97fynrZNa3+gqmqE40uSztIoa/oPMbgg+y3gO21fO4GPALclmWawZr+rvWQXsKrV3wbsGGHckqR5WHH6LnOrqjuAO06pfhq4apa+PwXePcrxJEmj8Ru5ktQRQ1+SOmLoS1JHDH1J6oihL0kdMfQlqSOGviR1xNCXpI4Y+pLUEUNfkjpi6EtSRwx9SeqIoS9JHTH0Jakjhr4kdcTQl6SOGPqS1BFDX5I6YuhLUkcMfUnqiKEvSR0x9CWpI4a+JHXE0Jekjhj6ktQRQ1+SOmLoS1JHDH1J6oihL0kdGSn0k6xMcl+Sf09yIMnbklycZG+Sp9rzRa1vktyVZDrJo0muXJi3IEk6U6Oe6X8S+GpVvRF4E3AA2AHsq6qNwL62DXAdsLE9tgN3j3hsSdJZmnfoJ7kQ+D1gF0BV/ayqXgC2ALtbt93ADa28Bbi3Br4BrExy6bxHLkk6a6Oc6V8GzAD/mOTbST6V5HXA6qo60vo8B6xu5bXAwaHXH2p1vyLJ9iRTSaZmZmZGGJ4k6VSjhP4K4Erg7qp6C/A//HIpB4CqKqDOZqdVtbOqJqtqcmJiYoThSZJONUroHwIOVdVDbfs+Bh8Cz59ctmnPR1v7YWD90OvXtTpJ0pjMO/Sr6jngYJLfbFWbgSeAPcDWVrcVuL+V9wA3t7t4NgHHh5aBJEljsGLE1/8p8JkkrwaeBj7A4IPkC0m2Ac8A72l9vwJcD0wDP2l9JUljNFLoV9UjwOQsTZtn6VvALaMcT5I0Gr+RK0kdMfQlqSOGviR1xNCXpI4Y+pLUEUNfkjpi6EtSRwx9SeqIoS9JHTH0Jakjhr4kdcTQl6SOGPqS1BFDX5I6YuhLUkcMfUnqiKEvSR0x9CWpI4a+JHXE0Jekjhj6ktQRQ1+SOmLoS1JHDH1J6oihL0kdMfQlqSOGviR1xNCXpI4Y+pLUkRWj7iDJecAUcLiq3pnkMuBzwCpgP/D+qvpZkvOBe4HfAX4I/HFVfW/U4y+kNWtmr3/22fGOQ5IWy0Kc6d8KHBja/hhwZ1VdDhwDtrX6bcCxVn9n6ydJGqORQj/JOuCPgE+17QDXAPe1LruBG1p5S9umtW9u/SVJYzLqmf4ngA8Dv2jbq4AXqupE2z4ErG3ltcBBgNZ+vPX/FUm2J5lKMjUzMzPi8CRJw+Yd+kneCRytqv0LOB6qamdVTVbV5MTExELuWpK6N8qF3KuBdyW5HrgA+HXgk8DKJCva2fw64HDrfxhYDxxKsgK4kMEFXUnSmMz7TL+qbq+qdVW1AXgv8EBVvQ94ELixddsK3N/Ke9o2rf2Bqqr5Hl+SdPYW4z79jwC3JZlmsGa/q9XvAla1+tuAHYtwbEnSyxj5Pn2Aqvo68PVWfhq4apY+PwXevRDHkyTNj9/IlaSOGPqS1BFDX5I6YuhLUkcMfUnqiKEvSR0x9CWpI4a+JHXE0Jekjhj6ktQRQ1+SOmLoS1JHDH1J6oihL0kdMfQlqSOGviR1xNCXpI4Y+pLUEUNfkjpi6EtSRwx9SeqIoS9JHTH0Jakjhr4kdcTQl6SOGPqS1BFDX5I6YuhLUkcMfUnqyLxDP8n6JA8meSLJ40lubfUXJ9mb5Kn2fFGrT5K7kkwneTTJlQv1JiRJZ2aUM/0TwIeq6gpgE3BLkiuAHcC+qtoI7GvbANcBG9tjO3D3CMeWJM3DvEO/qo5U1bda+b+AA8BaYAuwu3XbDdzQyluAe2vgG8DKJJfOe+SSpLO2IGv6STYAbwEeAlZX1ZHW9BywupXXAgeHXnao1Z26r+1JppJMzczMLMTwJEnNilF3kOT1wBeBD1bVj5O82FZVlaTOZn9VtRPYCTA5OXlWr10sa9bMXv/ss+MdhySNaqQz/SSvYhD4n6mqL7Xq508u27Tno63+MLB+6OXrWp0kaUxGuXsnwC7gQFV9fKhpD7C1lbcC9w/V39zu4tkEHB9aBpIkjcEoyztXA+8HvpPkkVb3F8BHgS8k2QY8A7yntX0FuB6YBn4CfGCEY0uS5mHeoV9V/wpkjubNs/Qv4Jb5Hk+SNDq/kStJHTH0Jakjhr4kdcTQl6SOGPqS1JGRv5HbM7+pK+mVxjN9SeqIoS9JHTH0Jakjhr4kdcTQl6SOGPqS1BFDX5I6YuhLUkcMfUnqiKEvSR0x9CWpI4a+JHXE0Jekjvgrm4vAX9+UtFx5pi9JHTH0Jakjhr4kdcTQl6SOeCF3jLzAK2mpeaYvSR0x9CWpIy7vLAMu+0gaF0N/GfPDQNJCG/vyTpJrkzyZZDrJjnEfX5J6NtYz/STnAX8P/AFwCHg4yZ6qemKc43ilm+svgJfjXweSYPzLO1cB01X1NECSzwFbAEN/kc3ng2IxLfaHkEtj0uzGHfprgYND24eAtw53SLId2N42/zvJk/M81iXAD+b52p4syTwl4z7iyMf139PpOUdnZhzz9BtzNSy7C7lVtRPYOep+kkxV1eQCDOmc5jydGefp9JyjM7PU8zTuC7mHgfVD2+tanSRpDMYd+g8DG5NcluTVwHuBPWMegyR1a6zLO1V1IsmfAF8DzgPuqarHF+lwIy8RdcJ5OjPO0+k5R2dmSecpVbWUx5ckjZG/vSNJHTH0Jakj51zo+zMPv5TkniRHkzw2VHdxkr1JnmrPF7X6JLmrzdujSa5cupGPV5L1SR5M8kSSx5Pc2uqdqyFJLkjyzST/1ubpr1v9ZUkeavPx+XaTBknOb9vTrX3DUo5/nJKcl+TbSb7ctpfNHJ1ToT/0Mw/XAVcANyW5YmlHtaQ+DVx7St0OYF9VbQT2tW0YzNnG9tgO3D2mMS4HJ4APVdUVwCbglvbvxrn6Vf8LXFNVbwLeDFybZBPwMeDOqrocOAZsa/23Acda/Z2tXy9uBQ4MbS+fOaqqc+YBvA342tD27cDtSz2uJZ6TDcBjQ9tPApe28qXAk638D8BNs/Xr7QHcz+D3oZyruefotcC3GHyj/gfAilb/4v9BBnfpva2VV7R+Weqxj2Fu1jE4SbgG+DKQ5TRH59SZPrP/zMPaJRrLcrW6qo608nPA6lZ27oD25/VbgIdwrl6iLVs8AhwF9gLfBV6oqhOty/BcvDhPrf04sGq8I14SnwA+DPyiba9iGc3RuRb6Ogs1OL3wnt0myeuBLwIfrKofD7c5VwNV9X9V9WYGZ7NXAW9c4iEtK0neCRytqv1LPZa5nGuh7888nN7zSS4FaM9HW33Xc5fkVQwC/zNV9aVW7VzNoapeAB5ksFSxMsnJL3oOz8WL89TaLwR+OOahjtvVwLuSfA/4HIMlnk+yjOboXAt9f+bh9PYAW1t5K4P165P1N7c7UzYBx4eWNs5pSQLsAg5U1ceHmpyrIUkmkqxs5dcwuO5xgEH439i6nTpPJ+fvRuCB9hfTOauqbq+qdVW1gUH+PFBV72M5zdFSX/RYhIso1wP/wWCt8S+XejxLPBefBY4AP2ewjriNwXrhPuAp4J+Bi1vfMLjz6bvAd4DJpR7/GOfpdxks3TwKPNIe1ztXL5mn3wa+3ebpMeCvWv0bgG8C08A/Aee3+gva9nRrf8NSv4cxz9fbgS8vtznyZxgkqSPn2vKOJOllGPqS1BFDX5I6YuhLUkcMfUnqiKEvSR0x9CWpI/8PZyVyMK7W8XgAAAAASUVORK5CYII=
" />
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>Using this new information on the length distribution, we can remove outliers by filter by lengths of methods that fall outside of 95th percentile (chosen for completely arbitrary reasons)!</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<details class="description">
<summary class="btn btn-sm" data-open="Hide Code" data-close="Show Code"></summary>
<p><div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="k">def</span> <span class="nf">filter_len</span><span class="p">(</span>
<span class="n">row</span><span class="p">:</span> <span class="n">pd</span><span class="o">.</span><span class="n">Series</span><span class="p">,</span> <span class="n">tokenizer</span><span class="p">:</span> <span class="n">AutoTokenizer</span><span class="p">,</span> <span class="n">mthd_len</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">cmt_len</span><span class="p">:</span> <span class="nb">int</span>
<span class="p">)</span> <span class="o">-></span> <span class="nb">bool</span><span class="p">:</span>
<span class="sd">'''</span>
<span class="sd"> Determine if a given panda dataframe row has a method or comment that has</span>
<span class="sd"> more tokens than max length</span>
<span class="sd"> :param row: the row to check if it has a method or comment that is too long</span>
<span class="sd"> :param tokenizer: the tokenizer to tokenize a method or comment</span>
<span class="sd"> :param mthd_len: the max number of tokens a method can have</span>
<span class="sd"> :param cmt_len: the max number of tokens a comment can have</span>
<span class="sd"> :returns: whether or not the given row have a method or comment that have</span>
<span class="sd"> more tokens than a max length</span>
<span class="sd"> '''</span>
<span class="k">return</span> <span class="nb">len</span><span class="p">(</span><span class="n">tokenizer</span><span class="o">.</span><span class="n">tokenize</span><span class="p">(</span><span class="n">row</span><span class="o">.</span><span class="n">mthd</span><span class="p">))</span> <span class="o"><</span> <span class="n">mthd_len</span> <span class="ow">and</span> <span class="nb">len</span><span class="p">(</span><span class="n">tokenizer</span><span class="o">.</span><span class="n">tokenize</span><span class="p">(</span><span class="n">row</span><span class="o">.</span><span class="n">cmt</span><span class="p">))</span> <span class="o"><</span> <span class="n">cmt_len</span>
<span class="n">df_trn</span> <span class="o">=</span> <span class="n">df_trn</span><span class="p">[</span><span class="n">df_trn</span><span class="o">.</span><span class="n">apply</span><span class="p">(</span>
<span class="k">lambda</span> <span class="n">row</span><span class="p">:</span> <span class="n">filter_len</span><span class="p">(</span>
<span class="n">row</span><span class="p">,</span> <span class="n">tokenizer</span><span class="p">,</span> <span class="n">max_mthd_len</span><span class="p">,</span>
<span class="n">max_cmt_len</span>
<span class="p">),</span> <span class="n">axis</span> <span class="o">=</span> <span class="mi">1</span>
<span class="p">)]</span>
<span class="n">df_val</span> <span class="o">=</span> <span class="n">df_val</span><span class="p">[</span><span class="n">df_val</span><span class="o">.</span><span class="n">apply</span><span class="p">(</span>
<span class="k">lambda</span> <span class="n">row</span><span class="p">:</span> <span class="n">filter_len</span><span class="p">(</span>
<span class="n">row</span><span class="p">,</span> <span class="n">tokenizer</span><span class="p">,</span> <span class="n">max_mthd_len</span><span class="p">,</span>
<span class="n">max_cmt_len</span>
<span class="p">),</span> <span class="n">axis</span> <span class="o">=</span> <span class="mi">1</span>
<span class="p">)]</span>
<span class="n">df_tst</span> <span class="o">=</span> <span class="n">df_tst</span><span class="p">[</span><span class="n">df_tst</span><span class="o">.</span><span class="n">apply</span><span class="p">(</span>
<span class="k">lambda</span> <span class="n">row</span><span class="p">:</span> <span class="n">filter_len</span><span class="p">(</span>
<span class="n">row</span><span class="p">,</span> <span class="n">tokenizer</span><span class="p">,</span> <span class="n">max_mthd_len</span><span class="p">,</span>
<span class="n">max_cmt_len</span>
<span class="p">),</span> <span class="n">axis</span> <span class="o">=</span> <span class="mi">1</span>
<span class="p">)]</span>
<span class="nb">len</span><span class="p">(</span><span class="n">df_trn</span><span class="p">),</span> <span class="nb">len</span><span class="p">(</span><span class="n">df_val</span><span class="p">),</span> <span class="nb">len</span><span class="p">(</span><span class="n">df_tst</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
</p>
</details>
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_text output_subarea output_execute_result">
<pre>(2809, 88, 193)</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">max_mthd_len</span><span class="p">,</span> <span class="n">max_cmt_len</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_text output_subarea output_execute_result">
<pre>(559, 48)</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>We could do a lot more exploring of our data as the above exploration was the bare minimum. As an exercise, I suggest for you to explore the data on your own using whatever means necessary!</p>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h1 id="Training">Training<a class="anchor-link" href="#Training"> </a></h1><p>Now that we have our data processed and in a format we like, let's go ahead and start training! To accomplish this we will be using code from the awesome <a href="https://github.com/microsoft/CodeXGLUE">CodeXGLUE</a> repository. This repository is similar to the NLP equivalent GLUE benchmarks where a ton of awesome code related benchmarks are standardized and put into one place for the community to use! They have a ton of interesting ones and I highly suggest looking through their repo if you are interested in other code related tasks.</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">cd</span> <span class="o">./</span><span class="n">CodeXGLUE</span><span class="o">/</span><span class="n">Code</span><span class="o">-</span><span class="n">Text</span><span class="o">/</span><span class="n">code</span><span class="o">-</span><span class="n">to</span><span class="o">-</span><span class="n">text</span><span class="o">/</span><span class="n">code</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_subarea output_stream output_stdout output_text">
<pre>/content/CodeXGLUE/Code-Text/code-to-text/code
</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>Okay, I lied, sorry :(. One last processing step is required of our data, which is to just output the data into the structure that the awesome CodeXGLUE Code-Text benchmark expects.</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<details class="description">
<summary class="btn btn-sm" data-open="Hide Code" data-close="Show Code"></summary>
<p><div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="kn">import</span> <span class="nn">json</span>
<span class="n">df_trn</span><span class="p">[</span><span class="s1">'code_tokens'</span><span class="p">]</span> <span class="o">=</span> <span class="n">df_trn</span><span class="o">.</span><span class="n">mthd</span><span class="o">.</span><span class="n">apply</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">x</span><span class="o">.</span><span class="n">split</span><span class="p">())</span>
<span class="n">df_trn</span><span class="p">[</span><span class="s1">'docstring_tokens'</span><span class="p">]</span> <span class="o">=</span> <span class="n">df_trn</span><span class="o">.</span><span class="n">cmt</span><span class="o">.</span><span class="n">apply</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">x</span><span class="o">.</span><span class="n">split</span><span class="p">())</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s1">'java/train.jsonl'</span><span class="p">,</span><span class="s1">'w'</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
<span class="k">for</span> <span class="n">_</span><span class="p">,</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">df_trn</span><span class="o">.</span><span class="n">iterrows</span><span class="p">():</span>
<span class="n">f</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">json</span><span class="o">.</span><span class="n">dumps</span><span class="p">(</span><span class="n">row</span><span class="o">.</span><span class="n">to_dict</span><span class="p">())</span> <span class="o">+</span> <span class="s1">'</span><span class="se">\n</span><span class="s1">'</span><span class="p">)</span>
<span class="n">df_val</span><span class="p">[</span><span class="s1">'code_tokens'</span><span class="p">]</span> <span class="o">=</span> <span class="n">df_val</span><span class="o">.</span><span class="n">mthd</span><span class="o">.</span><span class="n">apply</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">x</span><span class="o">.</span><span class="n">split</span><span class="p">())</span>
<span class="n">df_val</span><span class="p">[</span><span class="s1">'docstring_tokens'</span><span class="p">]</span> <span class="o">=</span> <span class="n">df_val</span><span class="o">.</span><span class="n">cmt</span><span class="o">.</span><span class="n">apply</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">x</span><span class="o">.</span><span class="n">split</span><span class="p">())</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s1">'java/valid.jsonl'</span><span class="p">,</span><span class="s1">'w'</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
<span class="k">for</span> <span class="n">_</span><span class="p">,</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">df_val</span><span class="o">.</span><span class="n">iterrows</span><span class="p">():</span>
<span class="n">f</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">json</span><span class="o">.</span><span class="n">dumps</span><span class="p">(</span><span class="n">row</span><span class="o">.</span><span class="n">to_dict</span><span class="p">())</span> <span class="o">+</span> <span class="s1">'</span><span class="se">\n</span><span class="s1">'</span><span class="p">)</span>
<span class="n">df_tst</span><span class="p">[</span><span class="s1">'code_tokens'</span><span class="p">]</span> <span class="o">=</span> <span class="n">df_tst</span><span class="o">.</span><span class="n">mthd</span><span class="o">.</span><span class="n">apply</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">x</span><span class="o">.</span><span class="n">split</span><span class="p">())</span>
<span class="n">df_tst</span><span class="p">[</span><span class="s1">'docstring_tokens'</span><span class="p">]</span> <span class="o">=</span> <span class="n">df_tst</span><span class="o">.</span><span class="n">cmt</span><span class="o">.</span><span class="n">apply</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">x</span><span class="o">.</span><span class="n">split</span><span class="p">())</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s1">'java/test.jsonl'</span><span class="p">,</span><span class="s1">'w'</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
<span class="k">for</span> <span class="n">_</span><span class="p">,</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">df_tst</span><span class="o">.</span><span class="n">iterrows</span><span class="p">():</span>
<span class="n">f</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">json</span><span class="o">.</span><span class="n">dumps</span><span class="p">(</span><span class="n">row</span><span class="o">.</span><span class="n">to_dict</span><span class="p">())</span> <span class="o">+</span> <span class="s1">'</span><span class="se">\n</span><span class="s1">'</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
</p>
</details>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">lang</span> <span class="o">=</span> <span class="s1">'java'</span> <span class="c1"># programming language</span>
<span class="n">lr</span> <span class="o">=</span> <span class="mf">5e-5</span>
<span class="n">batch_size</span> <span class="o">=</span> <span class="mi">8</span> <span class="c1"># change depending on the GPU Colab gives you</span>
<span class="n">beam_size</span> <span class="o">=</span> <span class="mi">10</span>
<span class="n">source_length</span> <span class="o">=</span> <span class="mi">256</span>
<span class="n">target_length</span> <span class="o">=</span> <span class="n">max_cmt_len</span>
<span class="n">data_dir</span> <span class="o">=</span> <span class="s1">'.'</span>
<span class="n">output_dir</span> <span class="o">=</span> <span class="sa">f</span><span class="s1">'model/</span><span class="si">{</span><span class="n">lang</span><span class="si">}</span><span class="s1">'</span>
<span class="n">train_file</span> <span class="o">=</span> <span class="sa">f</span><span class="s1">'</span><span class="si">{</span><span class="n">data_dir</span><span class="si">}</span><span class="s1">/</span><span class="si">{</span><span class="n">lang</span><span class="si">}</span><span class="s1">/train.jsonl'</span>
<span class="n">dev_file</span> <span class="o">=</span> <span class="sa">f</span><span class="s1">'</span><span class="si">{</span><span class="n">data_dir</span><span class="si">}</span><span class="s1">/</span><span class="si">{</span><span class="n">lang</span><span class="si">}</span><span class="s1">/valid.jsonl'</span>
<span class="n">epochs</span> <span class="o">=</span> <span class="mi">10</span>
<span class="n">pretrained_model</span> <span class="o">=</span> <span class="s1">'microsoft/codebert-base'</span>
<span class="o">!</span> python run.py <span class="err">\</span>
<span class="o">--</span><span class="n">do_train</span> \
<span class="o">--</span><span class="n">do_eval</span> \
<span class="o">--</span><span class="n">do_lower_case</span> \
<span class="o">--</span><span class="n">model_type</span> <span class="n">roberta</span> \
<span class="o">--</span><span class="n">model_name_or_path</span> <span class="p">{</span><span class="n">pretrained_model</span><span class="p">}</span> \
<span class="o">--</span><span class="n">train_filename</span> <span class="p">{</span><span class="n">train_file</span><span class="p">}</span> \
<span class="o">--</span><span class="n">dev_filename</span> <span class="p">{</span><span class="n">dev_file</span><span class="p">}</span> \
<span class="o">--</span><span class="n">output_dir</span> <span class="p">{</span><span class="n">output_dir</span><span class="p">}</span> \
<span class="o">--</span><span class="n">max_source_length</span> <span class="p">{</span><span class="n">source_length</span><span class="p">}</span> \
<span class="o">--</span><span class="n">max_target_length</span> <span class="p">{</span><span class="n">target_length</span><span class="p">}</span> \
<span class="o">--</span><span class="n">beam_size</span> <span class="p">{</span><span class="n">beam_size</span><span class="p">}</span> \
<span class="o">--</span><span class="n">train_batch_size</span> <span class="p">{</span><span class="n">batch_size</span><span class="p">}</span> \
<span class="o">--</span><span class="n">eval_batch_size</span> <span class="p">{</span><span class="n">batch_size</span><span class="p">}</span> \
<span class="o">--</span><span class="n">learning_rate</span> <span class="p">{</span><span class="n">lr</span><span class="p">}</span> \
<span class="o">--</span><span class="n">num_train_epochs</span> <span class="p">{</span><span class="n">epochs</span><span class="p">}</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_subarea output_stream output_stdout output_text">
<pre>2021-01-14 20:49:04.427229: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
01/14/2021 20:49:06 - INFO - __main__ - Namespace(adam_epsilon=1e-08, beam_size=10, config_name='', dev_filename='./java/valid.jsonl', do_eval=True, do_lower_case=True, do_test=False, do_train=True, eval_batch_size=8, eval_steps=-1, gradient_accumulation_steps=1, learning_rate=5e-05, load_model_path=None, local_rank=-1, max_grad_norm=1.0, max_source_length=256, max_steps=-1, max_target_length=48, model_name_or_path='microsoft/codebert-base', model_type='roberta', no_cuda=False, num_train_epochs=10, output_dir='model/java', seed=42, test_filename=None, tokenizer_name='', train_batch_size=8, train_filename='./java/train.jsonl', train_steps=-1, warmup_steps=0, weight_decay=0.0)
01/14/2021 20:49:06 - WARNING - __main__ - Process rank: -1, device: cuda, n_gpu: 1, distributed training: False
01/14/2021 20:49:06 - INFO - filelock - Lock 140293701425752 acquired on /root/.cache/torch/transformers/08477dcecf305af90229876aa01e4b0f3594dc8c638985a72277f39ea7d8d0c3.7fb14267817b1d26bb44a57cd5aa2fc003c25e87b75ef77e9c55c4804675b4cf.lock
Downloading: 100% 499M/499M [00:06<00:00, 73.5MB/s]
01/14/2021 20:49:13 - INFO - filelock - Lock 140293701425752 released on /root/.cache/torch/transformers/08477dcecf305af90229876aa01e4b0f3594dc8c638985a72277f39ea7d8d0c3.7fb14267817b1d26bb44a57cd5aa2fc003c25e87b75ef77e9c55c4804675b4cf.lock
01/14/2021 20:49:30 - INFO - __main__ - *** Example ***
01/14/2021 20:49:30 - INFO - __main__ - idx: 0
01/14/2021 20:49:30 - INFO - __main__ - source_tokens: ['<s>', 'public', '_static', '_void', '_check', 'j', 'av', 'ain', 'ternal', 'access', '(', 'il', 'og', 'ger', '_logger', ')', '_{', '_if', '_(', 'log', 'ger', '_==', '_null', '_||', '_!', 'java', 'version', '.', 'is', 'at', 'le', 'ast', '(', 'java', 'version', '.', 'java', '_', '9', '))', '_{', '_//', '_older', '_java', '_versions', '_are', '_fine', '_with', '_the', '_reflection', '_return', ';', '_}', '_map', '<', 'string', ',', '_package', 'access', 'requ', 'irement', '[]', '>', '_requirements', '_=', '_new', '_tre', 'em', 'ap', '<', 'string', ',', '_package', 'access', 'requ', 'irement', '[]', '>', '();', '_requirements', '.', 'put', '("', 'java', '.', 'base', '",', '_new', '_package', 'access', 'requ', 'irement', '[]', '_{', '_create', 'requ', 'irement', '(', 'false', ',', '_"', 'j', 'dk', '.', 'internal', '.', 'ref', '"),', '_create', 'requ', 'irement', '(', 'true', ',', '_"', 'java', '.', 'lang', '"),', '_create', 'requ', 'irement', '(', 'true', ',', '_"', 'java', '.', 'n', 'io', '"),', '_create', 'requ', 'irement', '(', 'true', ',', '_"', 'sun', '.', 'n', 'io', '.', 'ch', '")', '_});', '_requirements', '.', 'put', '("', 'j', 'dk', '.', 'management', '",', '_get', 'j', 'dk', 'management', 'requ', 'irements', '());', '_requirements', '.', 'put', '("', 'java', '.', 'management', '",', '_new', '_package', 'access', 'requ', 'irement', '[]', '_{', '_create', 'requ', 'irement', '(', 'true', ',', '_"', 'sun', '.', 'management', '")', '_});', '_check', 'package', 'requ', 'irements', '(', 'log', 'ger', ',', '_requirements', ');', '_}', '</s>']
01/14/2021 20:49:30 - INFO - __main__ - source_ids: 0 15110 25156 13842 1649 267 1469 1851 46378 28300 1640 718 2154 2403 37764 43 25522 114 36 12376 2403 45994 23796 45056 27785 43830 21747 4 354 415 459 1988 1640 43830 21747 4 43830 1215 466 35122 25522 21277 2530 46900 7952 32 2051 19 5 12456 671 131 35524 5456 41552 20951 6 3737 28300 42172 34074 48992 15698 3471 5457 92 6110 991 1115 41552 20951 6 3737 28300 42172 34074 48992 15698 47006 3471 4 9179 46469 43830 4 11070 1297 92 3737 28300 42172 34074 48992 25522 1045 42172 34074 1640 22303 6 22 267 43357 4 37559 4 13043 16844 1045 42172 34074 1640 29225 6 22 43830 4 32373 16844 1045 42172 34074 1640 29225 6 22 43830 4 282 1020 16844 1045 42172 34074 1640 29225 6 22 21381 4 282 1020 4 611 8070 47771 3471 4 9179 46469 267 43357 4 14668 1297 120 267 43357 14668 42172 48227 49291 3471 4 9179 46469 43830 4 14668 1297 92 3737 28300 42172 34074 48992 25522 1045 42172 34074 1640 29225 6 22 21381 4 14668 8070 47771 1649 46181 42172 48227 1640 12376 2403 6 3471 4397 35524 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
01/14/2021 20:49:30 - INFO - __main__ - source_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
01/14/2021 20:49:30 - INFO - __main__ - target_tokens: ['<s>', 'prints', '_warning', '_to', '_given', '_if', '_haz', 'el', 'cast', '_is', '_not', '_provided', '_a', '_sufficient', '_access', '_to', '_java', '_internal', '_packages', '_on', '_java', '_9', '_and', '_newer', '.', '</s>']
01/14/2021 20:49:30 - INFO - __main__ - target_ids: 0 31553 2892 7 576 114 32468 523 5182 16 45 1286 10 7719 899 7 46900 3425 8368 15 46900 361 8 13964 4 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
01/14/2021 20:49:30 - INFO - __main__ - target_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
01/14/2021 20:49:30 - INFO - __main__ - *** Example ***
01/14/2021 20:49:30 - INFO - __main__ - idx: 1
01/14/2021 20:49:30 - INFO - __main__ - source_tokens: ['<s>', 'public', '_void', '_marsh', 'all', '(', 's', 'ct', 'e', '20', 'pl', 'use', 'mb', 'edd', 'edd', 'est', 'inations', 'ettings', '_s', 'ct', 'e', '20', 'pl', 'use', 'mb', 'edd', 'edd', 'est', 'inations', 'ettings', ',', '_protocol', 'm', 'arsh', 'all', 'er', '_protocol', 'm', 'arsh', 'all', 'er', ')', '_{', '_if', '_(', 's', 'ct', 'e', '20', 'pl', 'use', 'mb', 'edd', 'edd', 'est', 'inations', 'ettings', '_==', '_null', ')', '_{', '_throw', '_new', '_s', 'dk', 'client', 'ex', 'ception', '("', 'in', 'valid', '_argument', '_passed', '_to', '_marsh', 'all', '(', '...)', '");', '_}', '_try', '_{', '_}', '_catch', '_(', 'ex', 'ception', '_e', ')', '_{', '_throw', '_new', '_s', 'dk', 'client', 'ex', 'ception', '("', 'un', 'able', '_to', '_marsh', 'all', '_request', '_to', '_json', ':', '_"', '_+', '_e', '.', 'get', 'message', '(),', '_e', ');', '_}', '_}', '</s>']
01/14/2021 20:49:30 - INFO - __main__ - source_ids: 0 15110 13842 16377 1250 1640 29 3894 242 844 2911 3698 6648 13093 13093 990 17808 48496 579 3894 242 844 2911 3698 6648 13093 13093 990 17808 48496 6 11883 119 14980 1250 254 11883 119 14980 1250 254 43 25522 114 36 29 3894 242 844 2911 3698 6648 13093 13093 990 17808 48496 45994 23796 43 25522 3211 92 579 43357 38557 3463 20900 46469 179 42679 4795 1595 7 16377 1250 1640 41137 45751 35524 860 25522 35524 2916 36 3463 20900 364 43 25522 3211 92 579 43357 38557 3463 20900 46469 879 868 7 16377 1250 2069 7 49133 35 22 2055 364 4 6460 44773 49196 364 4397 35524 35524 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
01/14/2021 20:49:30 - INFO - __main__ - source_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
01/14/2021 20:49:30 - INFO - __main__ - target_tokens: ['<s>', 'm', 'arsh', 'all', '_the', '_given', '_parameter', '_object', '.', '</s>']
01/14/2021 20:49:30 - INFO - __main__ - target_ids: 0 119 14980 1250 5 576 43797 7626 4 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
01/14/2021 20:49:30 - INFO - __main__ - target_mask: 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
01/14/2021 20:49:30 - INFO - __main__ - *** Example ***
01/14/2021 20:49:30 - INFO - __main__ - idx: 2
01/14/2021 20:49:30 - INFO - __main__ - source_tokens: ['<s>', '@', 'over', 'ride', '_public', '_void', '_pref', 'etch', 'token', '(', 'final', '_file', '_token', 'file', ',', '_final', '_props', '_props', ',', '_final', '_logger', '_logger', ')', '_throws', '_had', 'oop', 'security', 'man', 'age', 'rex', 'ception', '_{', '_final', '_string', '_us', 'ert', 'op', 'roxy', '_=', '_props', '.', 'get', 'string', '(', 'job', 'properties', '.', 'user', '_', 'to', '_', 'proxy', ');', '_logger', '.', 'info', '("', 'getting', '_had', 'oop', '_tokens', '_based', '_on', '_props', '_for', '_"', '_+', '_us', 'ert', 'op', 'roxy', ');', '_dop', 'ref', 'etch', '(', 'token', 'file', ',', '_props', ',', '_logger', ',', '_us', 'ert', 'op', 'roxy', ');', '_}', '</s>']
01/14/2021 20:49:30 - INFO - __main__ - source_ids: 0 1039 2137 23167 285 13842 33284 29094 46657 1640 6156 2870 19233 21710 6 507 26504 26504 6 507 37764 37764 43 6989 56 18042 15506 397 1580 19633 20900 25522 507 6755 201 2399 1517 46963 5457 26504 4 6460 20951 1640 30056 47276 4 12105 1215 560 1215 47315 4397 37764 4 23999 46469 31315 56 18042 22121 716 15 26504 13 22 2055 201 2399 1517 46963 4397 32331 13043 29094 1640 46657 21710 6 26504 6 37764 6 201 2399 1517 46963 4397 35524 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
01/14/2021 20:49:30 - INFO - __main__ - source_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
01/14/2021 20:49:30 - INFO - __main__ - target_tokens: ['<s>', '/*', '_gets', '_had', 'oop', '_tokens', '_for', '_a', '_user', '_to', '_run', '_map', 'red', '/', 'h', 'ive', '_jobs', '_on', '_a', '_secured', '_cluster', '</s>']
01/14/2021 20:49:30 - INFO - __main__ - target_ids: 0 49051 1516 56 18042 22121 13 10 3018 7 422 5456 2050 73 298 2088 1315 15 10 5288 18016 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
01/14/2021 20:49:30 - INFO - __main__ - target_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
01/14/2021 20:49:30 - INFO - __main__ - *** Example ***
01/14/2021 20:49:30 - INFO - __main__ - idx: 3
01/14/2021 20:49:30 - INFO - __main__ - source_tokens: ['<s>', '@', 'over', 'ride', '_public', '_<', 'y', '>', '_singular', 'attribute', '<', 'x', ',', '_y', '>', '_get', 'decl', 'ared', 'id', '(', 'class', '<', 'y', '>', '_param', 'class', ')', '_{', '_if', '_(', 'id', 'attribute', '_!=', '_null', ')', '_{', '_if', '_(', 'id', 'attribute', '.', 'get', 'j', 'av', 'at', 'ype', '().', 'equ', 'als', '(', 'param', 'class', ')', '_&&', '_!', 'is', 'id', 'class', ')', '_{', '_return', '_(', 'sing', 'ular', 'attribute', '<', 'x', ',', '_y', '>)', '_id', 'attribute', ';', '_}', '_}', '_on', 'error', '();', '_return', '_null', ';', '_}', '</s>']
01/14/2021 20:49:30 - INFO - __main__ - source_ids: 0 1039 2137 23167 285 28696 219 15698 23429 49202 41552 1178 6 1423 15698 120 32639 6537 808 1640 4684 41552 219 15698 40206 4684 43 25522 114 36 808 49202 49333 23796 43 25522 114 36 808 49202 4 6460 267 1469 415 37356 49123 8198 1536 1640 46669 4684 43 48200 27785 354 808 4684 43 25522 671 36 26058 8244 49202 41552 1178 6 1423 49798 13561 49202 131 35524 35524 15 44223 47006 671 23796 131 35524 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
01/14/2021 20:49:30 - INFO - __main__ - source_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
01/14/2021 20:49:30 - INFO - __main__ - target_tokens: ['<s>', '/*', '_(', 'non', '-', 'j', 'av', 'ad', 'oc', ')', '</s>']
01/14/2021 20:49:30 - INFO - __main__ - target_ids: 0 49051 36 13424 12 267 1469 625 1975 43 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
01/14/2021 20:49:30 - INFO - __main__ - target_mask: 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
01/14/2021 20:49:30 - INFO - __main__ - *** Example ***
01/14/2021 20:49:30 - INFO - __main__ - idx: 4
01/14/2021 20:49:30 - INFO - __main__ - source_tokens: ['<s>', 'public', '_void', '_sync', '(', 'bo', 'olean', '_syn', 'call', 'se', 'gments', ')', '_{', '_commit', 'log', 'se', 'gment', '_current', '_=', '_alloc', 'ator', '.', 'all', 'ocating', 'from', '();', '_for', '_(', 'commit', 'log', 'se', 'gment', '_segment', '_:', '_alloc', 'ator', '.', 'get', 'act', 'ives', 'eg', 'ments', '())', '_{', '_if', '_(!', 'sync', 'all', 'se', 'gments', '_&&', '_segment', '.', 'id', '_>', '_current', '.', 'id', ')', '_return', ';', '_segment', '.', 'sync', '();', '_}', '_}', '</s>']
01/14/2021 20:49:30 - INFO - __main__ - source_ids: 0 15110 13842 22785 1640 3983 48547 17796 16395 1090 30237 43 25522 6225 12376 1090 10757 595 5457 42793 2630 4 1250 18106 7761 47006 13 36 42721 12376 1090 10757 2835 4832 42793 2630 4 6460 7257 3699 3733 2963 49338 25522 114 48209 45176 1250 1090 30237 48200 2835 4 808 8061 595 4 808 43 671 131 2835 4 45176 47006 35524 35524 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
01/14/2021 20:49:30 - INFO - __main__ - source_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
01/14/2021 20:49:30 - INFO - __main__ - target_tokens: ['<s>', 'forces', '_a', '_disk', '_flush', '_on', '_the', '_commit', '_log', '_files', '_that', '_need', '_it', '.', '_blocking', '.', '</s>']
01/14/2021 20:49:30 - INFO - __main__ - target_ids: 0 34532 10 21675 24841 15 5 6225 7425 6773 14 240 24 4 8890 4 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
01/14/2021 20:49:30 - INFO - __main__ - target_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
01/14/2021 20:49:33 - INFO - __main__ - ***** Running training *****
01/14/2021 20:49:33 - INFO - __main__ - Num examples = 2809
01/14/2021 20:49:33 - INFO - __main__ - Batch size = 8
01/14/2021 20:49:33 - INFO - __main__ - Num epoch = 10
epoch 0 loss 6.8534: 100% 352/352 [02:53<00:00, 2.03it/s]
01/14/2021 20:52:27 - INFO - __main__ -
***** Running evaluation *****
01/14/2021 20:52:27 - INFO - __main__ - Num examples = 88
01/14/2021 20:52:27 - INFO - __main__ - Batch size = 8
01/14/2021 20:52:29 - INFO - __main__ - eval_ppl = 420.66683
01/14/2021 20:52:29 - INFO - __main__ - global_step = 353
01/14/2021 20:52:29 - INFO - __main__ - train_loss = 6.8534
01/14/2021 20:52:29 - INFO - __main__ - ********************
01/14/2021 20:52:31 - INFO - __main__ - Best ppl:420.66683
01/14/2021 20:52:31 - INFO - __main__ - ********************
Total: 88
01/14/2021 20:52:58 - INFO - __main__ - bleu-4 = 9.79
01/14/2021 20:52:58 - INFO - __main__ - ********************
01/14/2021 20:52:58 - INFO - __main__ - Best bleu:9.79
01/14/2021 20:52:58 - INFO - __main__ - ********************
epoch 1 loss 5.2249: 100% 352/352 [02:57<00:00, 1.98it/s]
01/14/2021 20:55:58 - INFO - __main__ -
***** Running evaluation *****
01/14/2021 20:55:58 - INFO - __main__ - Num examples = 88
01/14/2021 20:55:58 - INFO - __main__ - Batch size = 8
01/14/2021 20:56:00 - INFO - __main__ - eval_ppl = 223.30135
01/14/2021 20:56:00 - INFO - __main__ - global_step = 705
01/14/2021 20:56:00 - INFO - __main__ - train_loss = 5.2249
01/14/2021 20:56:00 - INFO - __main__ - ********************
01/14/2021 20:56:02 - INFO - __main__ - Best ppl:223.30135
01/14/2021 20:56:02 - INFO - __main__ - ********************
Total: 88
01/14/2021 20:56:30 - INFO - __main__ - bleu-4 = 10.3
01/14/2021 20:56:30 - INFO - __main__ - ********************
01/14/2021 20:56:30 - INFO - __main__ - Best bleu:10.3
01/14/2021 20:56:30 - INFO - __main__ - ********************
epoch 2 loss 4.4676: 100% 352/352 [02:57<00:00, 1.98it/s]
01/14/2021 20:59:31 - INFO - __main__ -
***** Running evaluation *****
01/14/2021 20:59:31 - INFO - __main__ - Num examples = 88
01/14/2021 20:59:31 - INFO - __main__ - Batch size = 8
01/14/2021 20:59:32 - INFO - __main__ - eval_ppl = 167.43889
01/14/2021 20:59:32 - INFO - __main__ - global_step = 1057
01/14/2021 20:59:32 - INFO - __main__ - train_loss = 4.4676
01/14/2021 20:59:32 - INFO - __main__ - ********************
01/14/2021 20:59:35 - INFO - __main__ - Best ppl:167.43889
01/14/2021 20:59:35 - INFO - __main__ - ********************
Total: 88
01/14/2021 21:00:05 - INFO - __main__ - bleu-4 = 10.68
01/14/2021 21:00:05 - INFO - __main__ - ********************
01/14/2021 21:00:05 - INFO - __main__ - Best bleu:10.68
01/14/2021 21:00:05 - INFO - __main__ - ********************
epoch 3 loss 3.8263: 100% 352/352 [02:57<00:00, 1.98it/s]
01/14/2021 21:03:05 - INFO - __main__ -
***** Running evaluation *****
01/14/2021 21:03:05 - INFO - __main__ - Num examples = 88
01/14/2021 21:03:05 - INFO - __main__ - Batch size = 8
01/14/2021 21:03:07 - INFO - __main__ - eval_ppl = 160.25635
01/14/2021 21:03:07 - INFO - __main__ - global_step = 1409
01/14/2021 21:03:07 - INFO - __main__ - train_loss = 3.8263
01/14/2021 21:03:07 - INFO - __main__ - ********************
01/14/2021 21:03:10 - INFO - __main__ - Best ppl:160.25635
01/14/2021 21:03:10 - INFO - __main__ - ********************
Total: 88
01/14/2021 21:03:38 - INFO - __main__ - bleu-4 = 11.04
01/14/2021 21:03:38 - INFO - __main__ - ********************
01/14/2021 21:03:38 - INFO - __main__ - Best bleu:11.04
01/14/2021 21:03:38 - INFO - __main__ - ********************
epoch 4 loss 3.2797: 100% 352/352 [02:57<00:00, 1.98it/s]
01/14/2021 21:06:38 - INFO - __main__ -
***** Running evaluation *****
01/14/2021 21:06:38 - INFO - __main__ - Num examples = 88
01/14/2021 21:06:38 - INFO - __main__ - Batch size = 8
01/14/2021 21:06:40 - INFO - __main__ - eval_ppl = 152.19858
01/14/2021 21:06:40 - INFO - __main__ - global_step = 1761
01/14/2021 21:06:40 - INFO - __main__ - train_loss = 3.2797
01/14/2021 21:06:40 - INFO - __main__ - ********************
01/14/2021 21:06:42 - INFO - __main__ - Best ppl:152.19858
01/14/2021 21:06:42 - INFO - __main__ - ********************
Total: 88
01/14/2021 21:07:14 - INFO - __main__ - bleu-4 = 10.36
01/14/2021 21:07:14 - INFO - __main__ - ********************
epoch 5 loss 2.8204: 100% 352/352 [02:57<00:00, 1.98it/s]
01/14/2021 21:10:12 - INFO - __main__ -
***** Running evaluation *****
01/14/2021 21:10:12 - INFO - __main__ - Num examples = 88
01/14/2021 21:10:12 - INFO - __main__ - Batch size = 8
01/14/2021 21:10:13 - INFO - __main__ - eval_ppl = 150.95443
01/14/2021 21:10:13 - INFO - __main__ - global_step = 2113
01/14/2021 21:10:13 - INFO - __main__ - train_loss = 2.8204
01/14/2021 21:10:13 - INFO - __main__ - ********************
01/14/2021 21:10:16 - INFO - __main__ - Best ppl:150.95443
01/14/2021 21:10:16 - INFO - __main__ - ********************
Total: 88
01/14/2021 21:10:45 - INFO - __main__ - bleu-4 = 11.57
01/14/2021 21:10:45 - INFO - __main__ - ********************
01/14/2021 21:10:45 - INFO - __main__ - Best bleu:11.57
01/14/2021 21:10:45 - INFO - __main__ - ********************
epoch 6 loss 2.4442: 100% 352/352 [02:57<00:00, 1.98it/s]
01/14/2021 21:13:46 - INFO - __main__ -
***** Running evaluation *****
01/14/2021 21:13:46 - INFO - __main__ - Num examples = 88
01/14/2021 21:13:46 - INFO - __main__ - Batch size = 8
01/14/2021 21:13:47 - INFO - __main__ - eval_ppl = 156.69898
01/14/2021 21:13:47 - INFO - __main__ - global_step = 2465
01/14/2021 21:13:47 - INFO - __main__ - train_loss = 2.4442
01/14/2021 21:13:47 - INFO - __main__ - ********************
Total: 88
01/14/2021 21:14:17 - INFO - __main__ - bleu-4 = 10.65
01/14/2021 21:14:17 - INFO - __main__ - ********************
epoch 7 loss 2.1565: 100% 352/352 [02:57<00:00, 1.98it/s]
01/14/2021 21:17:15 - INFO - __main__ -
***** Running evaluation *****
01/14/2021 21:17:15 - INFO - __main__ - Num examples = 88
01/14/2021 21:17:15 - INFO - __main__ - Batch size = 8
01/14/2021 21:17:16 - INFO - __main__ - eval_ppl = 163.34726
01/14/2021 21:17:16 - INFO - __main__ - global_step = 2817
01/14/2021 21:17:16 - INFO - __main__ - train_loss = 2.1565
01/14/2021 21:17:16 - INFO - __main__ - ********************
Total: 88
01/14/2021 21:17:50 - INFO - __main__ - bleu-4 = 10.56
01/14/2021 21:17:50 - INFO - __main__ - ********************
epoch 8 loss 1.9398: 100% 352/352 [02:57<00:00, 1.98it/s]
01/14/2021 21:20:47 - INFO - __main__ -
***** Running evaluation *****
01/14/2021 21:20:47 - INFO - __main__ - Num examples = 88
01/14/2021 21:20:47 - INFO - __main__ - Batch size = 8
01/14/2021 21:20:49 - INFO - __main__ - eval_ppl = 166.41823
01/14/2021 21:20:49 - INFO - __main__ - global_step = 3169
01/14/2021 21:20:49 - INFO - __main__ - train_loss = 1.9398
01/14/2021 21:20:49 - INFO - __main__ - ********************
Total: 88
01/14/2021 21:21:26 - INFO - __main__ - bleu-4 = 10.74
01/14/2021 21:21:26 - INFO - __main__ - ********************
epoch 9 loss 1.7877: 100% 352/352 [02:57<00:00, 1.98it/s]
01/14/2021 21:24:24 - INFO - __main__ -
***** Running evaluation *****
01/14/2021 21:24:24 - INFO - __main__ - Num examples = 88
01/14/2021 21:24:24 - INFO - __main__ - Batch size = 8
01/14/2021 21:24:25 - INFO - __main__ - eval_ppl = 169.37057
01/14/2021 21:24:25 - INFO - __main__ - global_step = 3521
01/14/2021 21:24:25 - INFO - __main__ - train_loss = 1.7877
01/14/2021 21:24:25 - INFO - __main__ - ********************
Total: 88
01/14/2021 21:24:59 - INFO - __main__ - bleu-4 = 10.28
01/14/2021 21:24:59 - INFO - __main__ - ********************
</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>Yay! Our model has finished baking and we can now see how well it turned out by evaluating it!</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">batch_size</span><span class="o">=</span><span class="mi">64</span>
<span class="n">dev_file</span><span class="o">=</span><span class="sa">f</span><span class="s2">"</span><span class="si">{</span><span class="n">data_dir</span><span class="si">}</span><span class="s2">/</span><span class="si">{</span><span class="n">lang</span><span class="si">}</span><span class="s2">/valid.jsonl"</span>
<span class="n">test_file</span><span class="o">=</span><span class="sa">f</span><span class="s2">"</span><span class="si">{</span><span class="n">data_dir</span><span class="si">}</span><span class="s2">/</span><span class="si">{</span><span class="n">lang</span><span class="si">}</span><span class="s2">/test.jsonl"</span>
<span class="n">test_model</span><span class="o">=</span><span class="sa">f</span><span class="s2">"</span><span class="si">{</span><span class="n">output_dir</span><span class="si">}</span><span class="s2">/checkpoint-best-bleu/pytorch_model.bin"</span> <span class="c1">#checkpoint for test</span>
<span class="o">!</span> python run.py <span class="err">\</span>
<span class="o">--</span><span class="n">do_test</span> \
<span class="o">--</span><span class="n">model_type</span> <span class="n">roberta</span> \
<span class="o">--</span><span class="n">model_name_or_path</span> <span class="n">microsoft</span><span class="o">/</span><span class="n">codebert</span><span class="o">-</span><span class="n">base</span> \
<span class="o">--</span><span class="n">load_model_path</span> <span class="p">{</span><span class="n">test_model</span><span class="p">}</span> \
<span class="o">--</span><span class="n">dev_filename</span> <span class="p">{</span><span class="n">dev_file</span><span class="p">}</span> \
<span class="o">--</span><span class="n">test_filename</span> <span class="p">{</span><span class="n">test_file</span><span class="p">}</span> \
<span class="o">--</span><span class="n">output_dir</span> <span class="p">{</span><span class="n">output_dir</span><span class="p">}</span> \
<span class="o">--</span><span class="n">max_source_length</span> <span class="p">{</span><span class="n">source_length</span><span class="p">}</span> \
<span class="o">--</span><span class="n">max_target_length</span> <span class="p">{</span><span class="n">target_length</span><span class="p">}</span> \
<span class="o">--</span><span class="n">beam_size</span> <span class="p">{</span><span class="n">beam_size</span><span class="p">}</span> \
<span class="o">--</span><span class="n">eval_batch_size</span> <span class="p">{</span><span class="n">batch_size</span><span class="p">}</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_subarea output_stream output_stdout output_text">
<pre>2021-01-14 21:25:04.498200: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
01/14/2021 21:25:07 - INFO - __main__ - Namespace(adam_epsilon=1e-08, beam_size=10, config_name='', dev_filename='./java/valid.jsonl', do_eval=False, do_lower_case=False, do_test=True, do_train=False, eval_batch_size=64, eval_steps=-1, gradient_accumulation_steps=1, learning_rate=5e-05, load_model_path='model/java/checkpoint-best-bleu/pytorch_model.bin', local_rank=-1, max_grad_norm=1.0, max_source_length=256, max_steps=-1, max_target_length=48, model_name_or_path='microsoft/codebert-base', model_type='roberta', no_cuda=False, num_train_epochs=3, output_dir='model/java', seed=42, test_filename='./java/test.jsonl', tokenizer_name='', train_batch_size=8, train_filename=None, train_steps=-1, warmup_steps=0, weight_decay=0.0)
01/14/2021 21:25:07 - WARNING - __main__ - Process rank: -1, device: cuda, n_gpu: 1, distributed training: False
01/14/2021 21:25:23 - INFO - __main__ - reload model from model/java/checkpoint-best-bleu/pytorch_model.bin
01/14/2021 21:25:48 - INFO - __main__ - Test file: ./java/valid.jsonl
100% 2/2 [00:26<00:00, 13.34s/it]
Total: 88
01/14/2021 21:26:15 - INFO - __main__ - bleu-4 = 11.57
01/14/2021 21:26:15 - INFO - __main__ - ********************
01/14/2021 21:26:15 - INFO - __main__ - Test file: ./java/test.jsonl
100% 4/4 [00:55<00:00, 13.95s/it]
Total: 193
01/14/2021 21:27:11 - INFO - __main__ - bleu-4 = 9.74
01/14/2021 21:27:11 - INFO - __main__ - ********************
</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>Let's now load up our model and take it for a spin!</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<details class="description">
<summary class="btn btn-sm" data-open="Hide Code" data-close="Show Code"></summary>
<p><div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="kn">import</span> <span class="nn">torch</span>
<span class="kn">import</span> <span class="nn">torch.nn</span> <span class="k">as</span> <span class="nn">nn</span>
<span class="kn">from</span> <span class="nn">model</span> <span class="kn">import</span> <span class="n">Seq2Seq</span>
<span class="kn">from</span> <span class="nn">transformers</span> <span class="kn">import</span> <span class="n">RobertaConfig</span><span class="p">,</span> <span class="n">RobertaModel</span>
<span class="n">config</span> <span class="o">=</span> <span class="n">RobertaConfig</span><span class="o">.</span><span class="n">from_pretrained</span><span class="p">(</span><span class="n">pretrained_model</span><span class="p">)</span>
<span class="n">encoder</span> <span class="o">=</span> <span class="n">RobertaModel</span><span class="o">.</span><span class="n">from_pretrained</span><span class="p">(</span><span class="n">pretrained_model</span><span class="p">,</span> <span class="n">config</span> <span class="o">=</span> <span class="n">config</span><span class="p">)</span>
<span class="n">decoder_layer</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">TransformerDecoderLayer</span><span class="p">(</span><span class="n">d_model</span><span class="o">=</span><span class="n">config</span><span class="o">.</span><span class="n">hidden_size</span><span class="p">,</span> <span class="n">nhead</span><span class="o">=</span><span class="n">config</span><span class="o">.</span><span class="n">num_attention_heads</span><span class="p">)</span>
<span class="n">decoder</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">TransformerDecoder</span><span class="p">(</span><span class="n">decoder_layer</span><span class="p">,</span> <span class="n">num_layers</span><span class="o">=</span><span class="mi">6</span><span class="p">)</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">Seq2Seq</span><span class="p">(</span><span class="n">encoder</span> <span class="o">=</span> <span class="n">encoder</span><span class="p">,</span><span class="n">decoder</span> <span class="o">=</span> <span class="n">decoder</span><span class="p">,</span><span class="n">config</span><span class="o">=</span><span class="n">config</span><span class="p">,</span>
<span class="n">beam_size</span><span class="o">=</span><span class="n">beam_size</span><span class="p">,</span><span class="n">max_length</span><span class="o">=</span><span class="n">target_length</span><span class="p">,</span>
<span class="n">sos_id</span><span class="o">=</span><span class="n">tokenizer</span><span class="o">.</span><span class="n">cls_token_id</span><span class="p">,</span><span class="n">eos_id</span><span class="o">=</span><span class="n">tokenizer</span><span class="o">.</span><span class="n">sep_token_id</span><span class="p">)</span>
<span class="n">model</span><span class="o">.</span><span class="n">load_state_dict</span><span class="p">(</span><span class="n">torch</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">Path</span><span class="p">(</span><span class="n">output_dir</span><span class="p">)</span><span class="o">/</span><span class="s2">"checkpoint-last/pytorch_model.bin"</span><span class="p">))</span>
<span class="n">model</span><span class="o">.</span><span class="n">to</span><span class="p">(</span><span class="s1">'cuda'</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
</p>
</details>
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_text output_subarea output_execute_result">
<pre>Seq2Seq(
(encoder): RobertaModel(
(embeddings): RobertaEmbeddings(
(word_embeddings): Embedding(50265, 768, padding_idx=1)
(position_embeddings): Embedding(514, 768, padding_idx=1)
(token_type_embeddings): Embedding(1, 768)
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(encoder): RobertaEncoder(
(layer): ModuleList(
(0): RobertaLayer(
(attention): RobertaAttention(
(self): RobertaSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): RobertaSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): RobertaIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): RobertaOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(1): RobertaLayer(
(attention): RobertaAttention(
(self): RobertaSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): RobertaSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): RobertaIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): RobertaOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(2): RobertaLayer(
(attention): RobertaAttention(
(self): RobertaSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): RobertaSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): RobertaIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): RobertaOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(3): RobertaLayer(
(attention): RobertaAttention(
(self): RobertaSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): RobertaSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): RobertaIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): RobertaOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(4): RobertaLayer(
(attention): RobertaAttention(
(self): RobertaSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): RobertaSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): RobertaIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): RobertaOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(5): RobertaLayer(
(attention): RobertaAttention(
(self): RobertaSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): RobertaSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): RobertaIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): RobertaOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(6): RobertaLayer(
(attention): RobertaAttention(
(self): RobertaSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): RobertaSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): RobertaIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): RobertaOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(7): RobertaLayer(
(attention): RobertaAttention(
(self): RobertaSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): RobertaSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): RobertaIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): RobertaOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(8): RobertaLayer(
(attention): RobertaAttention(
(self): RobertaSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): RobertaSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): RobertaIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): RobertaOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(9): RobertaLayer(
(attention): RobertaAttention(
(self): RobertaSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): RobertaSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): RobertaIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): RobertaOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(10): RobertaLayer(
(attention): RobertaAttention(
(self): RobertaSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): RobertaSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): RobertaIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): RobertaOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(11): RobertaLayer(
(attention): RobertaAttention(
(self): RobertaSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): RobertaSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): RobertaIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): RobertaOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
)
(pooler): RobertaPooler(
(dense): Linear(in_features=768, out_features=768, bias=True)
(activation): Tanh()
)
)
(decoder): TransformerDecoder(
(layers): ModuleList(
(0): TransformerDecoderLayer(
(self_attn): MultiheadAttention(
(out_proj): Linear(in_features=768, out_features=768, bias=True)
)
(multihead_attn): MultiheadAttention(
(out_proj): Linear(in_features=768, out_features=768, bias=True)
)
(linear1): Linear(in_features=768, out_features=2048, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
(linear2): Linear(in_features=2048, out_features=768, bias=True)
(norm1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(norm2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(norm3): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout1): Dropout(p=0.1, inplace=False)
(dropout2): Dropout(p=0.1, inplace=False)
(dropout3): Dropout(p=0.1, inplace=False)
)
(1): TransformerDecoderLayer(
(self_attn): MultiheadAttention(
(out_proj): Linear(in_features=768, out_features=768, bias=True)
)
(multihead_attn): MultiheadAttention(
(out_proj): Linear(in_features=768, out_features=768, bias=True)
)
(linear1): Linear(in_features=768, out_features=2048, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
(linear2): Linear(in_features=2048, out_features=768, bias=True)
(norm1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(norm2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(norm3): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout1): Dropout(p=0.1, inplace=False)
(dropout2): Dropout(p=0.1, inplace=False)
(dropout3): Dropout(p=0.1, inplace=False)
)
(2): TransformerDecoderLayer(
(self_attn): MultiheadAttention(
(out_proj): Linear(in_features=768, out_features=768, bias=True)
)
(multihead_attn): MultiheadAttention(
(out_proj): Linear(in_features=768, out_features=768, bias=True)
)
(linear1): Linear(in_features=768, out_features=2048, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
(linear2): Linear(in_features=2048, out_features=768, bias=True)
(norm1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(norm2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(norm3): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout1): Dropout(p=0.1, inplace=False)
(dropout2): Dropout(p=0.1, inplace=False)
(dropout3): Dropout(p=0.1, inplace=False)
)
(3): TransformerDecoderLayer(
(self_attn): MultiheadAttention(
(out_proj): Linear(in_features=768, out_features=768, bias=True)
)
(multihead_attn): MultiheadAttention(
(out_proj): Linear(in_features=768, out_features=768, bias=True)
)
(linear1): Linear(in_features=768, out_features=2048, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
(linear2): Linear(in_features=2048, out_features=768, bias=True)
(norm1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(norm2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(norm3): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout1): Dropout(p=0.1, inplace=False)
(dropout2): Dropout(p=0.1, inplace=False)
(dropout3): Dropout(p=0.1, inplace=False)
)
(4): TransformerDecoderLayer(
(self_attn): MultiheadAttention(
(out_proj): Linear(in_features=768, out_features=768, bias=True)
)
(multihead_attn): MultiheadAttention(
(out_proj): Linear(in_features=768, out_features=768, bias=True)
)
(linear1): Linear(in_features=768, out_features=2048, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
(linear2): Linear(in_features=2048, out_features=768, bias=True)
(norm1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(norm2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(norm3): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout1): Dropout(p=0.1, inplace=False)
(dropout2): Dropout(p=0.1, inplace=False)
(dropout3): Dropout(p=0.1, inplace=False)
)
(5): TransformerDecoderLayer(
(self_attn): MultiheadAttention(
(out_proj): Linear(in_features=768, out_features=768, bias=True)
)
(multihead_attn): MultiheadAttention(
(out_proj): Linear(in_features=768, out_features=768, bias=True)
)
(linear1): Linear(in_features=768, out_features=2048, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
(linear2): Linear(in_features=2048, out_features=768, bias=True)
(norm1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(norm2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(norm3): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout1): Dropout(p=0.1, inplace=False)
(dropout2): Dropout(p=0.1, inplace=False)
(dropout3): Dropout(p=0.1, inplace=False)
)
)
)
(dense): Linear(in_features=768, out_features=768, bias=True)
(lm_head): Linear(in_features=768, out_features=50265, bias=False)
(lsm): LogSoftmax()
)</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">idx</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">TEXT_TO_SUMMARIZE</span> <span class="o">=</span> <span class="n">df_val</span><span class="o">.</span><span class="n">mthd</span><span class="o">.</span><span class="n">values</span><span class="p">[</span><span class="n">idx</span><span class="p">]</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'Code:'</span><span class="p">,</span> <span class="n">TEXT_TO_SUMMARIZE</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'Original Comment:'</span><span class="p">,</span> <span class="n">df_val</span><span class="o">.</span><span class="n">cmt</span><span class="o">.</span><span class="n">values</span><span class="p">[</span><span class="n">idx</span><span class="p">])</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_subarea output_stream output_stdout output_text">
<pre>Code: public static byte[] decode(final string s) { int delta = s.endswith("==") ? 2 : s.endswith("=") ? 1 : 0; byte[] buffer = new byte[s.length() * bytes_per_unencoded_block / bytes_per_encoded_block - delta]; int mask = 0xff; int pos = 0; for (int i = 0; i < s.length(); i += bytes_per_encoded_block) { int c0 = decode_table[s.charat(i)]; int c1 = decode_table[s.charat(i + 1)]; buffer[pos++] = (byte) (((c0 << 2) | (c1 >> 4)) & mask); if (pos >= buffer.length) { return buffer; } int c2 = decode_table[s.charat(i + 2)]; buffer[pos++] = (byte) (((c1 << 4) | (c2 >> 2)) & mask); if (pos >= buffer.length) { return buffer; } int c3 = decode_table[s.charat(i + 3)]; buffer[pos++] = (byte) (((c2 << 6) | c3) & mask); } return buffer; }
Original Comment: decodes the given base64-encoded string.
</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<details class="description">
<summary class="btn btn-sm" data-open="Hide Code" data-close="Show Code"></summary>
<p><div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="kn">from</span> <span class="nn">run</span> <span class="kn">import</span> <span class="n">convert_examples_to_features</span><span class="p">,</span> <span class="n">Example</span>
<span class="k">class</span> <span class="nc">Args</span><span class="p">:</span>
<span class="n">max_source_length</span> <span class="o">=</span> <span class="n">source_length</span>
<span class="n">max_target_length</span> <span class="o">=</span> <span class="n">target_length</span>
<span class="n">args</span> <span class="o">=</span> <span class="n">Args</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">get_preds</span><span class="p">(</span><span class="n">df</span><span class="p">:</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">):</span>
<span class="n">ps</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">idx</span><span class="p">,</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">tqdm</span><span class="p">(</span><span class="n">df</span><span class="o">.</span><span class="n">iterrows</span><span class="p">(),</span> <span class="n">total</span><span class="o">=</span><span class="nb">len</span><span class="p">(</span><span class="n">df</span><span class="p">)):</span>
<span class="n">examples</span> <span class="o">=</span> <span class="p">[</span>
<span class="n">Example</span><span class="p">(</span><span class="n">idx</span><span class="p">,</span> <span class="n">source</span> <span class="o">=</span> <span class="n">row</span><span class="o">.</span><span class="n">mthd</span><span class="p">,</span> <span class="n">target</span> <span class="o">=</span> <span class="n">row</span><span class="o">.</span><span class="n">cmt</span><span class="p">)</span>
<span class="p">]</span>
<span class="n">eval_features</span> <span class="o">=</span> <span class="n">convert_examples_to_features</span><span class="p">(</span>
<span class="n">examples</span><span class="p">,</span> <span class="n">tokenizer</span><span class="p">,</span> <span class="n">args</span><span class="p">,</span> <span class="n">stage</span><span class="o">=</span><span class="s1">'test'</span>
<span class="p">)</span>
<span class="n">source_ids</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">tensor</span><span class="p">(</span><span class="n">eval_features</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">source_ids</span><span class="p">,</span> <span class="n">dtype</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">long</span><span class="p">)</span><span class="o">.</span><span class="n">unsqueeze</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span><span class="o">.</span><span class="n">to</span><span class="p">(</span><span class="s1">'cuda'</span><span class="p">)</span>
<span class="n">source_mask</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">tensor</span><span class="p">(</span><span class="n">eval_features</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">source_mask</span><span class="p">,</span> <span class="n">dtype</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">long</span><span class="p">)</span><span class="o">.</span><span class="n">unsqueeze</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span><span class="o">.</span><span class="n">to</span><span class="p">(</span><span class="s1">'cuda'</span><span class="p">)</span>
<span class="k">with</span> <span class="n">torch</span><span class="o">.</span><span class="n">no_grad</span><span class="p">():</span>
<span class="n">preds</span> <span class="o">=</span> <span class="n">model</span><span class="p">(</span><span class="n">source_ids</span> <span class="o">=</span> <span class="n">source_ids</span><span class="p">,</span> <span class="n">source_mask</span> <span class="o">=</span> <span class="n">source_mask</span><span class="p">)</span>
<span class="k">for</span> <span class="n">pred</span> <span class="ow">in</span> <span class="n">preds</span><span class="p">:</span>
<span class="n">t</span> <span class="o">=</span> <span class="n">pred</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">cpu</span><span class="p">()</span><span class="o">.</span><span class="n">numpy</span><span class="p">()</span>
<span class="n">t</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="n">t</span><span class="p">)</span>
<span class="k">if</span> <span class="mi">0</span> <span class="ow">in</span> <span class="n">t</span><span class="p">:</span>
<span class="n">t</span> <span class="o">=</span> <span class="n">t</span><span class="p">[:</span><span class="n">t</span><span class="o">.</span><span class="n">index</span><span class="p">(</span><span class="mi">0</span><span class="p">)]</span>
<span class="n">text</span> <span class="o">=</span> <span class="n">tokenizer</span><span class="o">.</span><span class="n">decode</span><span class="p">(</span><span class="n">t</span><span class="p">,</span><span class="n">clean_up_tokenization_spaces</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span>
<span class="n">ps</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">text</span><span class="p">)</span>
<span class="k">return</span> <span class="n">ps</span>
</pre></div>
</div>
</div>
</div>
</p>
</details>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">df_val</span> <span class="o">=</span> <span class="n">df_val</span><span class="o">.</span><span class="n">reset_index</span><span class="p">()</span>
<span class="n">preds</span> <span class="o">=</span> <span class="n">get_preds</span><span class="p">(</span><span class="n">df_val</span><span class="o">.</span><span class="n">head</span><span class="p">(</span><span class="mi">10</span><span class="p">))</span>
<span class="k">for</span> <span class="n">idx</span><span class="p">,</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">df_val</span><span class="o">.</span><span class="n">head</span><span class="p">(</span><span class="mi">10</span><span class="p">)</span><span class="o">.</span><span class="n">iterrows</span><span class="p">():</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'Code:'</span><span class="p">,</span> <span class="n">row</span><span class="o">.</span><span class="n">mthd</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'Original Comment:'</span><span class="p">,</span> <span class="n">row</span><span class="o">.</span><span class="n">cmt</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'Generated Comment:'</span><span class="p">,</span> <span class="n">preds</span><span class="p">[</span><span class="n">idx</span><span class="p">])</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'='</span><span class="o">*</span><span class="mi">40</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_subarea output_stream output_stdout output_text">
<pre>
Code: public static byte[] decode(final string s) { int delta = s.endswith("==") ? 2 : s.endswith("=") ? 1 : 0; byte[] buffer = new byte[s.length() * bytes_per_unencoded_block / bytes_per_encoded_block - delta]; int mask = 0xff; int pos = 0; for (int i = 0; i < s.length(); i += bytes_per_encoded_block) { int c0 = decode_table[s.charat(i)]; int c1 = decode_table[s.charat(i + 1)]; buffer[pos++] = (byte) (((c0 << 2) | (c1 >> 4)) & mask); if (pos >= buffer.length) { return buffer; } int c2 = decode_table[s.charat(i + 2)]; buffer[pos++] = (byte) (((c1 << 4) | (c2 >> 2)) & mask); if (pos >= buffer.length) { return buffer; } int c3 = decode_table[s.charat(i + 3)]; buffer[pos++] = (byte) (((c2 << 6) | c3) & mask); } return buffer; }
Original Comment: decodes the given base64-encoded string.
Generated Comment: decode encode a string representation of string
========================================
Code: private void extractapklib( artifact apklibartifact ) throws mojoexecutionexception { getunpackedlibhelper().extractapklib( apklibartifact ); // copy the assets to the the combinedassets folder. // add the apklib source and resource to the compile. // nb apklib sources are added to compilesourceroot because we may need to compile against them. // this means the apklib classes will be compiled into target/classes and packaged with this build. copyfolder( getunpackedlibassetsfolder( apklibartifact ), combinedassets ); final file apklibsourcefolder = getunpackedapklibsourcefolder( apklibartifact ); final list<string> resourceexclusions = arrays.aslist( "**/*.java", "**/*.aidl" ); projecthelper.addresource( project, apklibsourcefolder.getabsolutepath(), null, resourceexclusions ); project.addcompilesourceroot( apklibsourcefolder.getabsolutepath() ); }
Original Comment: extracts apklib and adds the assets and apklib sources and resources to the build.
Generated Comment: extracts the cp libraries from the given source library.
========================================
Code: static <t> t[] copy(object[] source, int from, int to, t[] arrayoftype) { t[] result = newarray(arrayoftype, to - from); system.arraycopy(source, from, result, 0, to - from); return result; }
Original Comment: equivalent to arrays.copyofrange(source, from, to, arrayoftype.getclass()).
Generated Comment: creates a new object from the array.
========================================
Code: private static runtimedelegate finddelegate() { runtimedelegate result=null; try { result=createruntimedelegatefromspi(); if(result==null) { result=createruntimedelegatefromconfigurationfile(); } if(result==null) { string delegateclassname = system.getproperty(application_engine_spi_property); if(delegateclassname!=null) { result=createruntimedelegateforclassname(delegateclassname); } } } catch (exception ex) { logger.warn("could not find application engine",ex); } return result; }
Original Comment: obtain an instance using the method described in }.
Generated Comment: /* package
========================================
Code: public static string getcategory(string eventsrcname) { if (eventsrcname == null) { return null; } int end = eventsrcname.lastindexof('.'); eventsrcname = eventsrcname.substring(0, end); if (checkstyle_package.equals(eventsrcname)) { return "misc"; } else if (!eventsrcname.startswith(checkstyle_package)) { return "extension"; } return eventsrcname.substring(eventsrcname.lastindexof('.') + 1); }
Original Comment: get the rule category from an audit event source name.
Generated Comment: returns the contents of the event name.
========================================
Code: private collection<artifact> getserverdependencies(final string servertype, final expressionevaluator expressionevaluator) throws componentconfigurationexception { try { final mavenproject project = (mavenproject) expressionevaluator.evaluate("${project}"); final string localrepo = (string) expressionevaluator.evaluate("${settings.localrepository}"); final artifactrepository localrepository = repositorysystem.createlocalrepository(new file(localrepo)); final repositoryrequest repositoryrequest = new defaultrepositoryrequest(); repositoryrequest.setremoterepositories(project.getremoteartifactrepositories()); repositoryrequest.setlocalrepository(localrepository); final artifactresolutionrequest request = new artifactresolutionrequest(repositoryrequest); request.setartifact(getserverartifact(servertype)); request.setresolvetransitively(true); final artifactresolutionresult result = repositorysystem.resolve(request); if (result.issuccess()) { return result.getartifacts(); } boolean first = true; final stringbuilder builder = new stringbuilder("cannot resolve dependencies: ["); for (final artifact artifact : result.getmissingartifacts()) { if (!first) { builder.append(','); } else { first = false; } builder.append(artifact.getgroupid()); builder.append(':'); builder.append(artifact.getartifactid()); builder.append(':'); builder.append(artifact.getversion()); } builder.append("]"); throw new componentconfigurationexception(builder.tostring()); } catch (final expressionevaluationexception e) { throw new componentconfigurationexception("error evaluating expression", e); } catch (final invalidrepositoryexception e) { throw new componentconfigurationexception("error resolving local repository", e); } }
Original Comment: resolve the ldap server type artifact and its dependencies.
Generated Comment: gets the repositories from the repository.
========================================
Code: private void frame4() { long currenttime = system.currenttimemillis(); // xxx: lots of dummy value // record trade information in trade table. // insert into trade (t_id, t_dts, t_st_id, t_tt_id, t_is_cash, // t_s_symb, t_qty, t_bid_price, t_ca_id, t_exec_name, t_trade_price, // t_chrg, t_comm, t_tax, t_lifo) values (...) string sql = string.format("insert into trade (t_id, t_dts, t_st_id, t_tt_id, " + "t_is_cash, t_s_symb, t_qty, t_bid_price, t_ca_id, t_exec_name, " + "t_trade_price, t_chrg, t_comm, t_tax, t_lifo) values (%d, %d, '%s', " + "'%s', %d, '%s', %d, %f, %d, '%s', %f, %f, %f, %f, %d)", paramhelper.gettradeid(), currenttime, statusid, paramhelper.gettradetypeid(), 1, paramhelper.getsymbol(), paramhelper.gettradeqty(), marketprice, paramhelper.getacctid(), "exec_name", paramhelper.gettradeprice(), 0.0, 0.0, 0.0, 1); executeupdate(sql); // todo: implement this (not in the simplified version) // record pending trade information in trade_request table // if this trade is a limit trade // insert into trade_request (tr_t_id, tr_tt_id, tr_s_symb, tr_qty, // tr_bid_price, tr_b_id) values (...) // record trade information in trade_history table // insert into trade_history (th_t_id, th_dts, th_st_id) values (...) sql = string.format("insert into trade_history (th_t_id, th_dts, th_st_id) values " + "(%d, %d, '%s')", paramhelper.gettradeid(), currenttime, statusid); executeupdate(sql); }
Original Comment: record the trade request by making all related updates
Generated Comment: this method is used to create the database.
========================================
Code: protected string getquery() { final stringbuilder ret = new stringbuilder(); try { final string clazzname; if (efapssystemconfiguration.get().containsattributevalue("org.efaps.kernel.index.querybuilder")) { clazzname = efapssystemconfiguration.get().getattributevalue("org.efaps.kernel.index.querybuilder"); } else { clazzname = "org.efaps.esjp.admin.index.lucencequerybuilder"; } final class<?> clazz = class.forname(clazzname, false, efapsclassloader.getinstance()); final object obj = clazz.newinstance(); final method method = clazz.getmethod("getquery4dimvalues", string.class, list.class, list.class); final object newquery = method.invoke(obj, getcurrentquery(), getincluded(), getexcluded()); ret.append(newquery); } catch (final efapsexception | classnotfoundexception | instantiationexception | illegalaccessexception | nosuchmethodexception | securityexception | illegalargumentexception | invocationtargetexception e) { indexsearch.log.error("catched", e); ret.append(getcurrentquery()); } return ret.tostring(); }
Original Comment: gets the query.
Generated Comment: get the query instance.
========================================
Code: private languagedata findlanguage(final string locale) { for (final languagedata languagedata : languagedatadao.getall()) { if (languagedata.getlanguagecode().equalsignorecase(locale)) { return languagedata; } } return null; }
Original Comment: find language.
Generated Comment: gets the specified locale.
========================================
Code: private standardintrospectionresponse callstandardintrospection(string parameters) { if (parameters == null) { // authlete returns different error codes for null and an empty string. // 'null' is regarded as a caller's error. an empty string is regarded // as a client application's error. parameters = ""; } // create a request for authlete's /api/auth/introspection/standard api. standardintrospectionrequest request = new standardintrospectionrequest() .setparameters(parameters); try { // call authlete's /api/auth/introspection/standard api. return mapi.standardintrospection(request); } catch (authleteapiexception e) { // the api call failed. throw apifailure("/api/auth/introspection/standard", e); } }
Original Comment: call authlete's api.
Generated Comment: returns a set of authentication object.
========================================
</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>The model seems to be doing a good job, but if you play with it some more you'll realize it is mostly taking the name of the method and using that to guide the comment. This makes sense, but it probably isn't learning much more than this association, at least with this small model. Let's explore it a bit more by looking at all the examples in the validation set it is failing the most on.</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="k">def</span> <span class="nf">get_preds_losses</span><span class="p">(</span><span class="n">df</span><span class="p">:</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">):</span>
<span class="n">ps</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">losses</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">idx</span><span class="p">,</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">tqdm</span><span class="p">(</span><span class="n">df</span><span class="o">.</span><span class="n">iterrows</span><span class="p">(),</span> <span class="n">total</span><span class="o">=</span><span class="nb">len</span><span class="p">(</span><span class="n">df</span><span class="p">)):</span>
<span class="n">examples</span> <span class="o">=</span> <span class="p">[</span>
<span class="n">Example</span><span class="p">(</span><span class="n">idx</span><span class="p">,</span> <span class="n">source</span> <span class="o">=</span> <span class="n">row</span><span class="o">.</span><span class="n">mthd</span><span class="p">,</span> <span class="n">target</span> <span class="o">=</span> <span class="n">row</span><span class="o">.</span><span class="n">cmt</span><span class="p">)</span>
<span class="p">]</span>
<span class="n">eval_features</span> <span class="o">=</span> <span class="n">convert_examples_to_features</span><span class="p">(</span>
<span class="n">examples</span><span class="p">,</span> <span class="n">tokenizer</span><span class="p">,</span> <span class="n">args</span><span class="p">,</span> <span class="n">stage</span><span class="o">=</span><span class="s1">'test'</span>
<span class="p">)</span>
<span class="n">source_ids</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">tensor</span><span class="p">([</span><span class="n">f</span><span class="o">.</span><span class="n">source_ids</span> <span class="k">for</span> <span class="n">f</span> <span class="ow">in</span> <span class="n">eval_features</span><span class="p">],</span> <span class="n">dtype</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">long</span><span class="p">)</span><span class="o">.</span><span class="n">to</span><span class="p">(</span><span class="s1">'cuda'</span><span class="p">)</span>
<span class="n">source_mask</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">tensor</span><span class="p">([</span><span class="n">f</span><span class="o">.</span><span class="n">source_mask</span> <span class="k">for</span> <span class="n">f</span> <span class="ow">in</span> <span class="n">eval_features</span><span class="p">],</span> <span class="n">dtype</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">long</span><span class="p">)</span><span class="o">.</span><span class="n">to</span><span class="p">(</span><span class="s1">'cuda'</span><span class="p">)</span>
<span class="n">target_ids</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">tensor</span><span class="p">([</span><span class="n">f</span><span class="o">.</span><span class="n">target_ids</span> <span class="k">for</span> <span class="n">f</span> <span class="ow">in</span> <span class="n">eval_features</span><span class="p">],</span> <span class="n">dtype</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">long</span><span class="p">)</span><span class="o">.</span><span class="n">to</span><span class="p">(</span><span class="s1">'cuda'</span><span class="p">)</span>
<span class="n">target_mask</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">tensor</span><span class="p">([</span><span class="n">f</span><span class="o">.</span><span class="n">target_mask</span> <span class="k">for</span> <span class="n">f</span> <span class="ow">in</span> <span class="n">eval_features</span><span class="p">],</span> <span class="n">dtype</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">long</span><span class="p">)</span><span class="o">.</span><span class="n">to</span><span class="p">(</span><span class="s1">'cuda'</span><span class="p">)</span>
<span class="k">with</span> <span class="n">torch</span><span class="o">.</span><span class="n">no_grad</span><span class="p">():</span>
<span class="n">_</span><span class="p">,</span> <span class="n">loss</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="n">model</span><span class="p">(</span>
<span class="n">source_ids</span> <span class="o">=</span> <span class="n">source_ids</span><span class="p">,</span> <span class="n">source_mask</span> <span class="o">=</span> <span class="n">source_mask</span><span class="p">,</span>
<span class="n">target_ids</span> <span class="o">=</span> <span class="n">target_ids</span><span class="p">,</span> <span class="n">target_mask</span> <span class="o">=</span> <span class="n">target_mask</span>
<span class="p">)</span>
<span class="n">preds</span> <span class="o">=</span> <span class="n">model</span><span class="p">(</span><span class="n">source_ids</span> <span class="o">=</span> <span class="n">source_ids</span><span class="p">,</span> <span class="n">source_mask</span> <span class="o">=</span> <span class="n">source_mask</span><span class="p">)</span>
<span class="k">for</span> <span class="n">pred</span> <span class="ow">in</span> <span class="n">preds</span><span class="p">:</span>
<span class="n">t</span> <span class="o">=</span> <span class="n">pred</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">cpu</span><span class="p">()</span><span class="o">.</span><span class="n">numpy</span><span class="p">()</span>
<span class="n">t</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="n">t</span><span class="p">)</span>
<span class="k">if</span> <span class="mi">0</span> <span class="ow">in</span> <span class="n">t</span><span class="p">:</span>
<span class="n">t</span> <span class="o">=</span> <span class="n">t</span><span class="p">[:</span><span class="n">t</span><span class="o">.</span><span class="n">index</span><span class="p">(</span><span class="mi">0</span><span class="p">)]</span>
<span class="n">text</span> <span class="o">=</span> <span class="n">tokenizer</span><span class="o">.</span><span class="n">decode</span><span class="p">(</span><span class="n">t</span><span class="p">,</span><span class="n">clean_up_tokenization_spaces</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span>
<span class="n">ps</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">text</span><span class="p">)</span>
<span class="n">losses</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">loss</span><span class="o">.</span><span class="n">item</span><span class="p">())</span>
<span class="k">return</span> <span class="n">ps</span><span class="p">,</span> <span class="n">losses</span>
</pre></div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">df_head</span> <span class="o">=</span> <span class="n">df_val</span><span class="o">.</span><span class="n">copy</span><span class="p">()</span>
<span class="n">ps</span><span class="p">,</span> <span class="n">losses</span> <span class="o">=</span> <span class="n">get_preds_losses</span><span class="p">(</span><span class="n">df_head</span><span class="p">)</span>
<span class="n">df_head</span><span class="p">[</span><span class="s1">'pred'</span><span class="p">]</span> <span class="o">=</span> <span class="n">ps</span>
<span class="n">df_head</span><span class="p">[</span><span class="s1">'loss'</span><span class="p">]</span> <span class="o">=</span> <span class="n">losses</span>
<span class="n">df_sorted_losses</span> <span class="o">=</span> <span class="n">df_head</span><span class="o">.</span><span class="n">sort_values</span><span class="p">(</span><span class="s1">'loss'</span><span class="p">,</span> <span class="n">ascending</span> <span class="o">=</span> <span class="kc">False</span><span class="p">)</span>
<span class="k">for</span> <span class="n">_</span><span class="p">,</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">df_sorted_losses</span><span class="o">.</span><span class="n">head</span><span class="p">(</span><span class="mi">10</span><span class="p">)</span><span class="o">.</span><span class="n">iterrows</span><span class="p">():</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'Code:'</span><span class="p">,</span> <span class="n">row</span><span class="o">.</span><span class="n">mthd</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'Original Comment:'</span><span class="p">,</span> <span class="n">row</span><span class="o">.</span><span class="n">cmt</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'Generated Comment:'</span><span class="p">,</span> <span class="n">row</span><span class="o">.</span><span class="n">pred</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="n">row</span><span class="o">.</span><span class="n">loss</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'='</span><span class="o">*</span><span class="mi">40</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_subarea output_stream output_stdout output_text">
<pre>
Code: private collection<artifact> getserverdependencies(final string servertype, final expressionevaluator expressionevaluator) throws componentconfigurationexception { try { final mavenproject project = (mavenproject) expressionevaluator.evaluate("${project}"); final string localrepo = (string) expressionevaluator.evaluate("${settings.localrepository}"); final artifactrepository localrepository = repositorysystem.createlocalrepository(new file(localrepo)); final repositoryrequest repositoryrequest = new defaultrepositoryrequest(); repositoryrequest.setremoterepositories(project.getremoteartifactrepositories()); repositoryrequest.setlocalrepository(localrepository); final artifactresolutionrequest request = new artifactresolutionrequest(repositoryrequest); request.setartifact(getserverartifact(servertype)); request.setresolvetransitively(true); final artifactresolutionresult result = repositorysystem.resolve(request); if (result.issuccess()) { return result.getartifacts(); } boolean first = true; final stringbuilder builder = new stringbuilder("cannot resolve dependencies: ["); for (final artifact artifact : result.getmissingartifacts()) { if (!first) { builder.append(','); } else { first = false; } builder.append(artifact.getgroupid()); builder.append(':'); builder.append(artifact.getartifactid()); builder.append(':'); builder.append(artifact.getversion()); } builder.append("]"); throw new componentconfigurationexception(builder.tostring()); } catch (final expressionevaluationexception e) { throw new componentconfigurationexception("error evaluating expression", e); } catch (final invalidrepositoryexception e) { throw new componentconfigurationexception("error resolving local repository", e); } }
Original Comment: resolve the ldap server type artifact and its dependencies.
Generated Comment: gets the repository from the repository.
24.875783920288086
========================================
Code: public static byte[] decode(final string s) { int delta = s.endswith("==") ? 2 : s.endswith("=") ? 1 : 0; byte[] buffer = new byte[s.length() * bytes_per_unencoded_block / bytes_per_encoded_block - delta]; int mask = 0xff; int pos = 0; for (int i = 0; i < s.length(); i += bytes_per_encoded_block) { int c0 = decode_table[s.charat(i)]; int c1 = decode_table[s.charat(i + 1)]; buffer[pos++] = (byte) (((c0 << 2) | (c1 >> 4)) & mask); if (pos >= buffer.length) { return buffer; } int c2 = decode_table[s.charat(i + 2)]; buffer[pos++] = (byte) (((c1 << 4) | (c2 >> 2)) & mask); if (pos >= buffer.length) { return buffer; } int c3 = decode_table[s.charat(i + 3)]; buffer[pos++] = (byte) (((c2 << 6) | c3) & mask); } return buffer; }
Original Comment: decodes the given base64-encoded string.
Generated Comment: encodes a string value from a string.
24.304515838623047
========================================
Code: @override public void init(configurationvalueprovider... configurationvalueproviders) { if (configurationvalueproviders != null) { for (configurationproperty property : getcontainer().properties.values()) { property.init(configurationvalueproviders); } } }
Original Comment: override default values for properties with the given configurationproviders.
Generated Comment: configures all the options in the given configuration.
24.276317596435547
========================================
Code: private static boolean validatepart(string part, boolean isfinalpart) { // these tests could be collapsed into one big boolean expression, but // they have been left as independent tests for clarity. if (part.length() < 1 || part.length() > max_domain_part_length) { return false; } /* * gwt claims to support java.lang.character's char-classification methods, but it actually only * works for ascii. so for now, assume any non-ascii characters are valid. the only place this * seems to be documented is here: * http://osdir.com/ml/googlewebtoolkitcontributors/2010-03/msg00178.html * * <p>ascii characters in the part are expected to be valid per rfc 1035, with underscore also * being allowed due to widespread practice. */ string asciichars = charmatcher.ascii().retainfrom(part); if (!part_char_matcher.matchesallof(asciichars)) { return false; } // no initial or final dashes or underscores. if (dash_matcher.matches(part.charat(0)) || dash_matcher.matches(part.charat(part.length() - 1))) { return false; } /* * note that we allow (in contravention of a strict interpretation of the relevant rfcs) domain * parts other than the last may begin with a digit (for example, "3com.com"). it's important to * disallow an initial digit in the last part; it's the only thing that stops an ipv4 numeric * address like 127.0.0.1 from looking like a valid domain name. */ if (isfinalpart && charmatcher.digit().matches(part.charat(0))) { return false; } return true; }
Original Comment: helper method for }. validates that one part of a domain name is valid.
Generated Comment: parses a string representation of the given string.
24.256574630737305
========================================
Code: private void extractapklib( artifact apklibartifact ) throws mojoexecutionexception { getunpackedlibhelper().extractapklib( apklibartifact ); // copy the assets to the the combinedassets folder. // add the apklib source and resource to the compile. // nb apklib sources are added to compilesourceroot because we may need to compile against them. // this means the apklib classes will be compiled into target/classes and packaged with this build. copyfolder( getunpackedlibassetsfolder( apklibartifact ), combinedassets ); final file apklibsourcefolder = getunpackedapklibsourcefolder( apklibartifact ); final list<string> resourceexclusions = arrays.aslist( "**/*.java", "**/*.aidl" ); projecthelper.addresource( project, apklibsourcefolder.getabsolutepath(), null, resourceexclusions ); project.addcompilesourceroot( apklibsourcefolder.getabsolutepath() ); }
Original Comment: extracts apklib and adds the assets and apklib sources and resources to the build.
Generated Comment: extracts compiled from the cp compiler.
23.989707946777344
========================================
Code: public void adddefaultheader(final string name, final string value) { validate.notempty(name, "header name cannot be empty"); validate.notnull(value, "header value cannot be null, use an empty string instead"); this.checkconfigurable(); this.defaultheaders.put(name, value); }
Original Comment: adds a default header to be added to every stub http response.
Generated Comment: adds the headers to the headers.
23.846609115600586
========================================
Code: public static schema getschema(final file xsd, final errorhandler errorhandler) throws saxexception { // create a new instance for an xsd-aware schemafactory final schemafactory schemafactory = schemafactory .newinstance(http_www_w3_org_2001_xml_schema); // set the errorhandler implementation. schemafactory.seterrorhandler(errorhandler); // get the custom xsd schema that describes // the required format for my xml files. return schemafactory.newschema(xsd); }
Original Comment: gets the schema.
Generated Comment: creates a xml object from the given namespace.
23.77509880065918
========================================
Code: @override protected formatwriter createwriter(final outputstream outputstream, final formatlogger logger) { try { return new dsmlformatwriter(outputstream); } catch (final ioexception e) { logger.logerror("could not create and intialise the dsml writer", e); } return null; }
Original Comment: create the ldap writer that will dump ldap entries to a dsml file.
Generated Comment: creates a new writer as a xml file.
23.688125610351562
========================================
Code: @override public volatileimage createcompatiblevolatileimage(int width, int height, imagecapabilities caps, int transparency) throws awtexception { if (img == null) { img = new bufferedimage(1, 1, bufferedimage.type_int_argb); gc = img.creategraphics().getdeviceconfiguration(); } return gc.createcompatiblevolatileimage(width, height, caps, transparency); }
Original Comment: returns a volatile image. this method is a workaround for a classcastexception that occurs on macosx when exporting a swing ui that uses the nimbus look and feel to svg.
Generated Comment: create a new image from a new image.
23.60519790649414
========================================
Code: private static void printstacktrace(printstream out, throwable err) { out.println(err.getclass().getname() + ": " + err.getmessage()); for (stacktraceelement ste : err.getstacktrace()) { out.println("\tat " + ste.tostring()); } if (err.getcause() != null) { out.print("caused by: "); printstacktrace(out, err.getcause()); } }
Original Comment: print a complete stack trace. this differs from throwable.printstacktrace() in that it always prints all of the trace.
Generated Comment: print out a message.
23.529924392700195
========================================
</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h1 id="What's-Next?">What's Next?<a class="anchor-link" href="#What's-Next?"> </a></h1><p>If you'd like to see how you can integrate this code comment summarizer model into the popular VSCode IDE, check out my video that goes over just that!
<center>
<iframe width="560" height="315" src="https://www.youtube.com/embed/SYjgPjQ-vbc" frameborder="0" allowfullscreen=""></iframe>
</center>
</p>
</div>
</div>
</div>
</div>
<script type="application/vnd.jupyter.widget-state+json">
{"3b857331a30c40be8c2d72aa74f3dc92": {"model_module": "@jupyter-widgets/controls", "model_name": "HBoxModel", "state": {"_view_name": "HBoxView", "_dom_classes": [], "_model_name": "HBoxModel", "_view_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_view_count": null, "_view_module_version": "1.5.0", "box_style": "", "layout": "IPY_MODEL_c3c9713ede3a4ba58f7e28a2b507a74d", "_model_module": "@jupyter-widgets/controls", "children": ["IPY_MODEL_a9ce294b6d1847168505f7b7933e451c", "IPY_MODEL_430af23edb5d48c0a2645dc4c7ef8389"]}}, "c3c9713ede3a4ba58f7e28a2b507a74d": {"model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "state": {"_view_name": "LayoutView", "grid_template_rows": null, "right": null, "justify_content": null, "_view_module": "@jupyter-widgets/base", "overflow": null, "_model_module_version": "1.2.0", "_view_count": null, "flex_flow": null, "width": null, "min_width": null, "border": null, "align_items": null, "bottom": null, "_model_module": "@jupyter-widgets/base", "top": null, "grid_column": null, "overflow_y": null, "overflow_x": null, "grid_auto_flow": null, "grid_area": null, "grid_template_columns": null, "flex": null, "_model_name": "LayoutModel", "justify_items": null, "grid_row": null, "max_height": null, "align_content": null, "visibility": null, "align_self": null, "height": null, "min_height": null, "padding": null, "grid_auto_rows": null, "grid_gap": null, "max_width": null, "order": null, "_view_module_version": "1.2.0", "grid_template_areas": null, "object_position": null, "object_fit": null, "grid_auto_columns": null, "margin": null, "display": null, "left": null}}, "a9ce294b6d1847168505f7b7933e451c": {"model_module": "@jupyter-widgets/controls", "model_name": "FloatProgressModel", "state": {"_view_name": "ProgressView", "style": "IPY_MODEL_2321ec56ad9b4d58a281aec3c3bac773", "_dom_classes": [], "description": "100%", "_model_name": "FloatProgressModel", "bar_style": "success", "max": 994, "_view_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "value": 994, "_view_count": null, "_view_module_version": "1.5.0", "orientation": "horizontal", "min": 0, "description_tooltip": null, "_model_module": "@jupyter-widgets/controls", "layout": "IPY_MODEL_b02957dfe8034b5fa03112b8470eb5ae"}}, "430af23edb5d48c0a2645dc4c7ef8389": {"model_module": "@jupyter-widgets/controls", "model_name": "HTMLModel", "state": {"_view_name": "HTMLView", "style": "IPY_MODEL_b2ed68215d0243749552e3532870e518", "_dom_classes": [], "description": "", "_model_name": "HTMLModel", "placeholder": "\u200b", "_view_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "value": " 994/994 [00:04<00:00, 200.77it/s]", "_view_count": null, "_view_module_version": "1.5.0", "description_tooltip": null, "_model_module": "@jupyter-widgets/controls", "layout": "IPY_MODEL_d9d3d49b33b1445192e1d128407e3d31"}}, "2321ec56ad9b4d58a281aec3c3bac773": {"model_module": "@jupyter-widgets/controls", "model_name": "ProgressStyleModel", "state": {"_view_name": "StyleView", "_model_name": "ProgressStyleModel", "description_width": "initial", "_view_module": "@jupyter-widgets/base", "_model_module_version": "1.5.0", "_view_count": null, "_view_module_version": "1.2.0", "bar_color": null, "_model_module": "@jupyter-widgets/controls"}}, "b02957dfe8034b5fa03112b8470eb5ae": {"model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "state": {"_view_name": "LayoutView", "grid_template_rows": null, "right": null, "justify_content": null, "_view_module": "@jupyter-widgets/base", "overflow": null, "_model_module_version": "1.2.0", "_view_count": null, "flex_flow": null, "width": null, "min_width": null, "border": null, "align_items": null, "bottom": null, "_model_module": "@jupyter-widgets/base", "top": null, "grid_column": null, "overflow_y": null, "overflow_x": null, "grid_auto_flow": null, "grid_area": null, "grid_template_columns": null, "flex": null, "_model_name": "LayoutModel", "justify_items": null, "grid_row": null, "max_height": null, "align_content": null, "visibility": null, "align_self": null, "height": null, "min_height": null, "padding": null, "grid_auto_rows": null, "grid_gap": null, "max_width": null, "order": null, "_view_module_version": "1.2.0", "grid_template_areas": null, "object_position": null, "object_fit": null, "grid_auto_columns": null, "margin": null, "display": null, "left": null}}, "b2ed68215d0243749552e3532870e518": {"model_module": "@jupyter-widgets/controls", "model_name": "DescriptionStyleModel", "state": {"_view_name": "StyleView", "_model_name": "DescriptionStyleModel", "description_width": "", "_view_module": "@jupyter-widgets/base", "_model_module_version": "1.5.0", "_view_count": null, "_view_module_version": "1.2.0", "_model_module": "@jupyter-widgets/controls"}}, "d9d3d49b33b1445192e1d128407e3d31": {"model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "state": {"_view_name": "LayoutView", "grid_template_rows": null, "right": null, "justify_content": null, "_view_module": "@jupyter-widgets/base", "overflow": null, "_model_module_version": "1.2.0", "_view_count": null, "flex_flow": null, "width": null, "min_width": null, "border": null, "align_items": null, "bottom": null, "_model_module": "@jupyter-widgets/base", "top": null, "grid_column": null, "overflow_y": null, "overflow_x": null, "grid_auto_flow": null, "grid_area": null, "grid_template_columns": null, "flex": null, "_model_name": "LayoutModel", "justify_items": null, "grid_row": null, "max_height": null, "align_content": null, "visibility": null, "align_self": null, "height": null, "min_height": null, "padding": null, "grid_auto_rows": null, "grid_gap": null, "max_width": null, "order": null, "_view_module_version": "1.2.0", "grid_template_areas": null, "object_position": null, "object_fit": null, "grid_auto_columns": null, "margin": null, "display": null, "left": null}}, "806159a1e8e7483cb77e9fab5e75d380": {"model_module": "@jupyter-widgets/controls", "model_name": "HBoxModel", "state": {"_view_name": "HBoxView", "_dom_classes": [], "_model_name": "HBoxModel", "_view_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_view_count": null, "_view_module_version": "1.5.0", "box_style": "", "layout": "IPY_MODEL_f4c2498ed19a4a46bd1af0fc3c9ee076", "_model_module": "@jupyter-widgets/controls", "children": ["IPY_MODEL_1ba28592d96747ffb04b06739074c16f", "IPY_MODEL_020cbd0367f94df3a37f9613845078bb"]}}, "f4c2498ed19a4a46bd1af0fc3c9ee076": {"model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "state": {"_view_name": "LayoutView", "grid_template_rows": null, "right": null, "justify_content": null, "_view_module": "@jupyter-widgets/base", "overflow": null, "_model_module_version": "1.2.0", "_view_count": null, "flex_flow": null, "width": null, "min_width": null, "border": null, "align_items": null, "bottom": null, "_model_module": "@jupyter-widgets/base", "top": null, "grid_column": null, "overflow_y": null, "overflow_x": null, "grid_auto_flow": null, "grid_area": null, "grid_template_columns": null, "flex": null, "_model_name": "LayoutModel", "justify_items": null, "grid_row": null, "max_height": null, "align_content": null, "visibility": null, "align_self": null, "height": null, "min_height": null, "padding": null, "grid_auto_rows": null, "grid_gap": null, "max_width": null, "order": null, "_view_module_version": "1.2.0", "grid_template_areas": null, "object_position": null, "object_fit": null, "grid_auto_columns": null, "margin": null, "display": null, "left": null}}, "1ba28592d96747ffb04b06739074c16f": {"model_module": "@jupyter-widgets/controls", "model_name": "FloatProgressModel", "state": {"_view_name": "ProgressView", "style": "IPY_MODEL_44a84637344c4385a0015e70c7f5939f", "_dom_classes": [], "description": "100%", "_model_name": "FloatProgressModel", "bar_style": "success", "max": 33, "_view_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "value": 33, "_view_count": null, "_view_module_version": "1.5.0", "orientation": "horizontal", "min": 0, "description_tooltip": null, "_model_module": "@jupyter-widgets/controls", "layout": "IPY_MODEL_11070535e6674643954971d874a630f3"}}, "020cbd0367f94df3a37f9613845078bb": {"model_module": "@jupyter-widgets/controls", "model_name": "HTMLModel", "state": {"_view_name": "HTMLView", "style": "IPY_MODEL_d406d02f151f4a838ccb77f096f7ceb9", "_dom_classes": [], "description": "", "_model_name": "HTMLModel", "placeholder": "\u200b", "_view_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "value": " 33/33 [00:00<00:00, 85.45it/s]", "_view_count": null, "_view_module_version": "1.5.0", "description_tooltip": null, "_model_module": "@jupyter-widgets/controls", "layout": "IPY_MODEL_b2a2367d814b41a8a0ec8b5c30e99715"}}, "44a84637344c4385a0015e70c7f5939f": {"model_module": "@jupyter-widgets/controls", "model_name": "ProgressStyleModel", "state": {"_view_name": "StyleView", "_model_name": "ProgressStyleModel", "description_width": "initial", "_view_module": "@jupyter-widgets/base", "_model_module_version": "1.5.0", "_view_count": null, "_view_module_version": "1.2.0", "bar_color": null, "_model_module": "@jupyter-widgets/controls"}}, "11070535e6674643954971d874a630f3": {"model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "state": {"_view_name": "LayoutView", "grid_template_rows": null, "right": null, "justify_content": null, "_view_module": "@jupyter-widgets/base", "overflow": null, "_model_module_version": "1.2.0", "_view_count": null, "flex_flow": null, "width": null, "min_width": null, "border": null, "align_items": null, "bottom": null, "_model_module": "@jupyter-widgets/base", "top": null, "grid_column": null, "overflow_y": null, "overflow_x": null, "grid_auto_flow": null, "grid_area": null, "grid_template_columns": null, "flex": null, "_model_name": "LayoutModel", "justify_items": null, "grid_row": null, "max_height": null, "align_content": null, "visibility": null, "align_self": null, "height": null, "min_height": null, "padding": null, "grid_auto_rows": null, "grid_gap": null, "max_width": null, "order": null, "_view_module_version": "1.2.0", "grid_template_areas": null, "object_position": null, "object_fit": null, "grid_auto_columns": null, "margin": null, "display": null, "left": null}}, "d406d02f151f4a838ccb77f096f7ceb9": {"model_module": "@jupyter-widgets/controls", "model_name": "DescriptionStyleModel", "state": {"_view_name": "StyleView", "_model_name": "DescriptionStyleModel", "description_width": "", "_view_module": "@jupyter-widgets/base", "_model_module_version": "1.5.0", "_view_count": null, "_view_module_version": "1.2.0", "_model_module": "@jupyter-widgets/controls"}}, "b2a2367d814b41a8a0ec8b5c30e99715": {"model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "state": {"_view_name": "LayoutView", "grid_template_rows": null, "right": null, "justify_content": null, "_view_module": "@jupyter-widgets/base", "overflow": null, "_model_module_version": "1.2.0", "_view_count": null, "flex_flow": null, "width": null, "min_width": null, "border": null, "align_items": null, "bottom": null, "_model_module": "@jupyter-widgets/base", "top": null, "grid_column": null, "overflow_y": null, "overflow_x": null, "grid_auto_flow": null, "grid_area": null, "grid_template_columns": null, "flex": null, "_model_name": "LayoutModel", "justify_items": null, "grid_row": null, "max_height": null, "align_content": null, "visibility": null, "align_self": null, "height": null, "min_height": null, "padding": null, "grid_auto_rows": null, "grid_gap": null, "max_width": null, "order": null, "_view_module_version": "1.2.0", "grid_template_areas": null, "object_position": null, "object_fit": null, "grid_auto_columns": null, "margin": null, "display": null, "left": null}}, "f0bfb5447b374b45932afed6489a67e8": {"model_module": "@jupyter-widgets/controls", "model_name": "HBoxModel", "state": {"_view_name": "HBoxView", "_dom_classes": [], "_model_name": "HBoxModel", "_view_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_view_count": null, "_view_module_version": "1.5.0", "box_style": "", "layout": "IPY_MODEL_d578ec34916f446abbb5aaf17817a426", "_model_module": "@jupyter-widgets/controls", "children": ["IPY_MODEL_f4fdb2a5559b482e84f9e7465071782c", "IPY_MODEL_681dbea82baf4eb082230c640d1b4ec9"]}}, "d578ec34916f446abbb5aaf17817a426": {"model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "state": {"_view_name": "LayoutView", "grid_template_rows": null, "right": null, "justify_content": null, "_view_module": "@jupyter-widgets/base", "overflow": null, "_model_module_version": "1.2.0", "_view_count": null, "flex_flow": null, "width": null, "min_width": null, "border": null, "align_items": null, "bottom": null, "_model_module": "@jupyter-widgets/base", "top": null, "grid_column": null, "overflow_y": null, "overflow_x": null, "grid_auto_flow": null, "grid_area": null, "grid_template_columns": null, "flex": null, "_model_name": "LayoutModel", "justify_items": null, "grid_row": null, "max_height": null, "align_content": null, "visibility": null, "align_self": null, "height": null, "min_height": null, "padding": null, "grid_auto_rows": null, "grid_gap": null, "max_width": null, "order": null, "_view_module_version": "1.2.0", "grid_template_areas": null, "object_position": null, "object_fit": null, "grid_auto_columns": null, "margin": null, "display": null, "left": null}}, "f4fdb2a5559b482e84f9e7465071782c": {"model_module": "@jupyter-widgets/controls", "model_name": "FloatProgressModel", "state": {"_view_name": "ProgressView", "style": "IPY_MODEL_c4113096379f464b9bb2158ef74852fc", "_dom_classes": [], "description": "100%", "_model_name": "FloatProgressModel", "bar_style": "success", "max": 64, "_view_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "value": 64, "_view_count": null, "_view_module_version": "1.5.0", "orientation": "horizontal", "min": 0, "description_tooltip": null, "_model_module": "@jupyter-widgets/controls", "layout": "IPY_MODEL_4459d6897c7b42bdadabc914a087653d"}}, "681dbea82baf4eb082230c640d1b4ec9": {"model_module": "@jupyter-widgets/controls", "model_name": "HTMLModel", "state": {"_view_name": "HTMLView", "style": "IPY_MODEL_774fc47921074a809d8c82c9d814c5f9", "_dom_classes": [], "description": "", "_model_name": "HTMLModel", "placeholder": "\u200b", "_view_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "value": " 64/64 [00:00<00:00, 238.14it/s]", "_view_count": null, "_view_module_version": "1.5.0", "description_tooltip": null, "_model_module": "@jupyter-widgets/controls", "layout": "IPY_MODEL_848f6db15b2344c58d64d4f87dd48cbf"}}, "c4113096379f464b9bb2158ef74852fc": {"model_module": "@jupyter-widgets/controls", "model_name": "ProgressStyleModel", "state": {"_view_name": "StyleView", "_model_name": "ProgressStyleModel", "description_width": "initial", "_view_module": "@jupyter-widgets/base", "_model_module_version": "1.5.0", "_view_count": null, "_view_module_version": "1.2.0", "bar_color": null, "_model_module": "@jupyter-widgets/controls"}}, "4459d6897c7b42bdadabc914a087653d": {"model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "state": {"_view_name": "LayoutView", "grid_template_rows": null, "right": null, "justify_content": null, "_view_module": "@jupyter-widgets/base", "overflow": null, "_model_module_version": "1.2.0", "_view_count": null, "flex_flow": null, "width": null, "min_width": null, "border": null, "align_items": null, "bottom": null, "_model_module": "@jupyter-widgets/base", "top": null, "grid_column": null, "overflow_y": null, "overflow_x": null, "grid_auto_flow": null, "grid_area": null, "grid_template_columns": null, "flex": null, "_model_name": "LayoutModel", "justify_items": null, "grid_row": null, "max_height": null, "align_content": null, "visibility": null, "align_self": null, "height": null, "min_height": null, "padding": null, "grid_auto_rows": null, "grid_gap": null, "max_width": null, "order": null, "_view_module_version": "1.2.0", "grid_template_areas": null, "object_position": null, "object_fit": null, "grid_auto_columns": null, "margin": null, "display": null, "left": null}}, "774fc47921074a809d8c82c9d814c5f9": {"model_module": "@jupyter-widgets/controls", "model_name": "DescriptionStyleModel", "state": {"_view_name": "StyleView", "_model_name": "DescriptionStyleModel", "description_width": "", "_view_module": "@jupyter-widgets/base", "_model_module_version": "1.5.0", "_view_count": null, "_view_module_version": "1.2.0", "_model_module": "@jupyter-widgets/controls"}}, "848f6db15b2344c58d64d4f87dd48cbf": {"model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "state": {"_view_name": "LayoutView", "grid_template_rows": null, "right": null, "justify_content": null, "_view_module": "@jupyter-widgets/base", "overflow": null, "_model_module_version": "1.2.0", "_view_count": null, "flex_flow": null, "width": null, "min_width": null, "border": null, "align_items": null, "bottom": null, "_model_module": "@jupyter-widgets/base", "top": null, "grid_column": null, "overflow_y": null, "overflow_x": null, "grid_auto_flow": null, "grid_area": null, "grid_template_columns": null, "flex": null, "_model_name": "LayoutModel", "justify_items": null, "grid_row": null, "max_height": null, "align_content": null, "visibility": null, "align_self": null, "height": null, "min_height": null, "padding": null, "grid_auto_rows": null, "grid_gap": null, "max_width": null, "order": null, "_view_module_version": "1.2.0", "grid_template_areas": null, "object_position": null, "object_fit": null, "grid_auto_columns": null, "margin": null, "display": null, "left": null}}, "534f9f4fc88e4e81a72e9d81b9a5648d": {"model_module": "@jupyter-widgets/controls", "model_name": "HBoxModel", "state": {"_view_name": "HBoxView", "_dom_classes": [], "_model_name": "HBoxModel", "_view_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_view_count": null, "_view_module_version": "1.5.0", "box_style": "", "layout": "IPY_MODEL_29122410a67c4d729f961b5f5570e3e4", "_model_module": "@jupyter-widgets/controls", "children": ["IPY_MODEL_d873a9252af64730a799f561ada8677f", "IPY_MODEL_8e55ea779e304a8299bad7e052640a3c"]}}, "29122410a67c4d729f961b5f5570e3e4": {"model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "state": {"_view_name": "LayoutView", "grid_template_rows": null, "right": null, "justify_content": null, "_view_module": "@jupyter-widgets/base", "overflow": null, "_model_module_version": "1.2.0", "_view_count": null, "flex_flow": null, "width": null, "min_width": null, "border": null, "align_items": null, "bottom": null, "_model_module": "@jupyter-widgets/base", "top": null, "grid_column": null, "overflow_y": null, "overflow_x": null, "grid_auto_flow": null, "grid_area": null, "grid_template_columns": null, "flex": null, "_model_name": "LayoutModel", "justify_items": null, "grid_row": null, "max_height": null, "align_content": null, "visibility": null, "align_self": null, "height": null, "min_height": null, "padding": null, "grid_auto_rows": null, "grid_gap": null, "max_width": null, "order": null, "_view_module_version": "1.2.0", "grid_template_areas": null, "object_position": null, "object_fit": null, "grid_auto_columns": null, "margin": null, "display": null, "left": null}}, "d873a9252af64730a799f561ada8677f": {"model_module": "@jupyter-widgets/controls", "model_name": "FloatProgressModel", "state": {"_view_name": "ProgressView", "style": "IPY_MODEL_45aac35c512948ea964d175cb7353fe9", "_dom_classes": [], "description": "100%", "_model_name": "FloatProgressModel", "bar_style": "success", "max": 3580, "_view_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "value": 3580, "_view_count": null, "_view_module_version": "1.5.0", "orientation": "horizontal", "min": 0, "description_tooltip": null, "_model_module": "@jupyter-widgets/controls", "layout": "IPY_MODEL_37fea365dff1432e8c2982a10ee5307f"}}, "8e55ea779e304a8299bad7e052640a3c": {"model_module": "@jupyter-widgets/controls", "model_name": "HTMLModel", "state": {"_view_name": "HTMLView", "style": "IPY_MODEL_e1af434e3b3644f99534e97cfea29533", "_dom_classes": [], "description": "", "_model_name": "HTMLModel", "placeholder": "\u200b", "_view_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "value": " 3580/3580 [00:00<00:00, 41481.30it/s]", "_view_count": null, "_view_module_version": "1.5.0", "description_tooltip": null, "_model_module": "@jupyter-widgets/controls", "layout": "IPY_MODEL_2782a4ec534d41faad0a26227706f274"}}, "45aac35c512948ea964d175cb7353fe9": {"model_module": "@jupyter-widgets/controls", "model_name": "ProgressStyleModel", "state": {"_view_name": "StyleView", "_model_name": "ProgressStyleModel", "description_width": "initial", "_view_module": "@jupyter-widgets/base", "_model_module_version": "1.5.0", "_view_count": null, "_view_module_version": "1.2.0", "bar_color": null, "_model_module": "@jupyter-widgets/controls"}}, "37fea365dff1432e8c2982a10ee5307f": {"model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "state": {"_view_name": "LayoutView", "grid_template_rows": null, "right": null, "justify_content": null, "_view_module": "@jupyter-widgets/base", "overflow": null, "_model_module_version": "1.2.0", "_view_count": null, "flex_flow": null, "width": null, "min_width": null, "border": null, "align_items": null, "bottom": null, "_model_module": "@jupyter-widgets/base", "top": null, "grid_column": null, "overflow_y": null, "overflow_x": null, "grid_auto_flow": null, "grid_area": null, "grid_template_columns": null, "flex": null, "_model_name": "LayoutModel", "justify_items": null, "grid_row": null, "max_height": null, "align_content": null, "visibility": null, "align_self": null, "height": null, "min_height": null, "padding": null, "grid_auto_rows": null, "grid_gap": null, "max_width": null, "order": null, "_view_module_version": "1.2.0", "grid_template_areas": null, "object_position": null, "object_fit": null, "grid_auto_columns": null, "margin": null, "display": null, "left": null}}, "e1af434e3b3644f99534e97cfea29533": {"model_module": "@jupyter-widgets/controls", "model_name": "DescriptionStyleModel", "state": {"_view_name": "StyleView", "_model_name": "DescriptionStyleModel", "description_width": "", "_view_module": "@jupyter-widgets/base", "_model_module_version": "1.5.0", "_view_count": null, "_view_module_version": "1.2.0", "_model_module": "@jupyter-widgets/controls"}}, "2782a4ec534d41faad0a26227706f274": {"model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "state": {"_view_name": "LayoutView", "grid_template_rows": null, "right": null, "justify_content": null, "_view_module": "@jupyter-widgets/base", "overflow": null, "_model_module_version": "1.2.0", "_view_count": null, "flex_flow": null, "width": null, "min_width": null, "border": null, "align_items": null, "bottom": null, "_model_module": "@jupyter-widgets/base", "top": null, "grid_column": null, "overflow_y": null, "overflow_x": null, "grid_auto_flow": null, "grid_area": null, "grid_template_columns": null, "flex": null, "_model_name": "LayoutModel", "justify_items": null, "grid_row": null, "max_height": null, "align_content": null, "visibility": null, "align_self": null, "height": null, "min_height": null, "padding": null, "grid_auto_rows": null, "grid_gap": null, "max_width": null, "order": null, "_view_module_version": "1.2.0", "grid_template_areas": null, "object_position": null, "object_fit": null, "grid_auto_columns": null, "margin": null, "display": null, "left": null}}, "013a3c838d7c4571946d62882d6cf1a0": {"model_module": "@jupyter-widgets/controls", "model_name": "HBoxModel", "state": {"_view_name": "HBoxView", "_dom_classes": [], "_model_name": "HBoxModel", "_view_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_view_count": null, "_view_module_version": "1.5.0", "box_style": "", "layout": "IPY_MODEL_04879c3a956b4eca8eeec56a47d926a9", "_model_module": "@jupyter-widgets/controls", "children": ["IPY_MODEL_36bee4dd22ab4c0fbe18f7238d1a5919", "IPY_MODEL_5ce34677aab94e6da06c14b4629cf5b1"]}}, "04879c3a956b4eca8eeec56a47d926a9": {"model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "state": {"_view_name": "LayoutView", "grid_template_rows": null, "right": null, "justify_content": null, "_view_module": "@jupyter-widgets/base", "overflow": null, "_model_module_version": "1.2.0", "_view_count": null, "flex_flow": null, "width": null, "min_width": null, "border": null, "align_items": null, "bottom": null, "_model_module": "@jupyter-widgets/base", "top": null, "grid_column": null, "overflow_y": null, "overflow_x": null, "grid_auto_flow": null, "grid_area": null, "grid_template_columns": null, "flex": null, "_model_name": "LayoutModel", "justify_items": null, "grid_row": null, "max_height": null, "align_content": null, "visibility": null, "align_self": null, "height": null, "min_height": null, "padding": null, "grid_auto_rows": null, "grid_gap": null, "max_width": null, "order": null, "_view_module_version": "1.2.0", "grid_template_areas": null, "object_position": null, "object_fit": null, "grid_auto_columns": null, "margin": null, "display": null, "left": null}}, "36bee4dd22ab4c0fbe18f7238d1a5919": {"model_module": "@jupyter-widgets/controls", "model_name": "FloatProgressModel", "state": {"_view_name": "ProgressView", "style": "IPY_MODEL_a6b4336146b442c99f43132e1f461c38", "_dom_classes": [], "description": "100%", "_model_name": "FloatProgressModel", "bar_style": "success", "max": 104, "_view_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "value": 104, "_view_count": null, "_view_module_version": "1.5.0", "orientation": "horizontal", "min": 0, "description_tooltip": null, "_model_module": "@jupyter-widgets/controls", "layout": "IPY_MODEL_67e1dbac03ef4db097f2b3b236c21ee8"}}, "5ce34677aab94e6da06c14b4629cf5b1": {"model_module": "@jupyter-widgets/controls", "model_name": "HTMLModel", "state": {"_view_name": "HTMLView", "style": "IPY_MODEL_c7ed2d7677954b0ca6be318cfa64fcc5", "_dom_classes": [], "description": "", "_model_name": "HTMLModel", "placeholder": "\u200b", "_view_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "value": " 104/104 [00:00<00:00, 2280.36it/s]", "_view_count": null, "_view_module_version": "1.5.0", "description_tooltip": null, "_model_module": "@jupyter-widgets/controls", "layout": "IPY_MODEL_25e7d3db62db4215a3ae03d0341f5f53"}}, "a6b4336146b442c99f43132e1f461c38": {"model_module": "@jupyter-widgets/controls", "model_name": "ProgressStyleModel", "state": {"_view_name": "StyleView", "_model_name": "ProgressStyleModel", "description_width": "initial", "_view_module": "@jupyter-widgets/base", "_model_module_version": "1.5.0", "_view_count": null, "_view_module_version": "1.2.0", "bar_color": null, "_model_module": "@jupyter-widgets/controls"}}, "67e1dbac03ef4db097f2b3b236c21ee8": {"model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "state": {"_view_name": "LayoutView", "grid_template_rows": null, "right": null, "justify_content": null, "_view_module": "@jupyter-widgets/base", "overflow": null, "_model_module_version": "1.2.0", "_view_count": null, "flex_flow": null, "width": null, "min_width": null, "border": null, "align_items": null, "bottom": null, "_model_module": "@jupyter-widgets/base", "top": null, "grid_column": null, "overflow_y": null, "overflow_x": null, "grid_auto_flow": null, "grid_area": null, "grid_template_columns": null, "flex": null, "_model_name": "LayoutModel", "justify_items": null, "grid_row": null, "max_height": null, "align_content": null, "visibility": null, "align_self": null, "height": null, "min_height": null, "padding": null, "grid_auto_rows": null, "grid_gap": null, "max_width": null, "order": null, "_view_module_version": "1.2.0", "grid_template_areas": null, "object_position": null, "object_fit": null, "grid_auto_columns": null, "margin": null, "display": null, "left": null}}, "c7ed2d7677954b0ca6be318cfa64fcc5": {"model_module": "@jupyter-widgets/controls", "model_name": "DescriptionStyleModel", "state": {"_view_name": "StyleView", "_model_name": "DescriptionStyleModel", "description_width": "", "_view_module": "@jupyter-widgets/base", "_model_module_version": "1.5.0", "_view_count": null, "_view_module_version": "1.2.0", "_model_module": "@jupyter-widgets/controls"}}, "25e7d3db62db4215a3ae03d0341f5f53": {"model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "state": {"_view_name": "LayoutView", "grid_template_rows": null, "right": null, "justify_content": null, "_view_module": "@jupyter-widgets/base", "overflow": null, "_model_module_version": "1.2.0", "_view_count": null, "flex_flow": null, "width": null, "min_width": null, "border": null, "align_items": null, "bottom": null, "_model_module": "@jupyter-widgets/base", "top": null, "grid_column": null, "overflow_y": null, "overflow_x": null, "grid_auto_flow": null, "grid_area": null, "grid_template_columns": null, "flex": null, "_model_name": "LayoutModel", "justify_items": null, "grid_row": null, "max_height": null, "align_content": null, "visibility": null, "align_self": null, "height": null, "min_height": null, "padding": null, "grid_auto_rows": null, "grid_gap": null, "max_width": null, "order": null, "_view_module_version": "1.2.0", "grid_template_areas": null, "object_position": null, "object_fit": null, "grid_auto_columns": null, "margin": null, "display": null, "left": null}}, "34a1861eb41445db9993b8bda487527f": {"model_module": "@jupyter-widgets/controls", "model_name": "HBoxModel", "state": {"_view_name": "HBoxView", "_dom_classes": [], "_model_name": "HBoxModel", "_view_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_view_count": null, "_view_module_version": "1.5.0", "box_style": "", "layout": "IPY_MODEL_fa5d8c90ad004919b805b6e30ad54bb7", "_model_module": "@jupyter-widgets/controls", "children": ["IPY_MODEL_3507efb456534046b57f7ad7f8537259", "IPY_MODEL_62b0b88a2901493ab46161b2bf495392"]}}, "fa5d8c90ad004919b805b6e30ad54bb7": {"model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "state": {"_view_name": "LayoutView", "grid_template_rows": null, "right": null, "justify_content": null, "_view_module": "@jupyter-widgets/base", "overflow": null, "_model_module_version": "1.2.0", "_view_count": null, "flex_flow": null, "width": null, "min_width": null, "border": null, "align_items": null, "bottom": null, "_model_module": "@jupyter-widgets/base", "top": null, "grid_column": null, "overflow_y": null, "overflow_x": null, "grid_auto_flow": null, "grid_area": null, "grid_template_columns": null, "flex": null, "_model_name": "LayoutModel", "justify_items": null, "grid_row": null, "max_height": null, "align_content": null, "visibility": null, "align_self": null, "height": null, "min_height": null, "padding": null, "grid_auto_rows": null, "grid_gap": null, "max_width": null, "order": null, "_view_module_version": "1.2.0", "grid_template_areas": null, "object_position": null, "object_fit": null, "grid_auto_columns": null, "margin": null, "display": null, "left": null}}, "3507efb456534046b57f7ad7f8537259": {"model_module": "@jupyter-widgets/controls", "model_name": "FloatProgressModel", "state": {"_view_name": "ProgressView", "style": "IPY_MODEL_ecc567a207b14690814d5e677d40f182", "_dom_classes": [], "description": "100%", "_model_name": "FloatProgressModel", "bar_style": "success", "max": 221, "_view_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "value": 221, "_view_count": null, "_view_module_version": "1.5.0", "orientation": "horizontal", "min": 0, "description_tooltip": null, "_model_module": "@jupyter-widgets/controls", "layout": "IPY_MODEL_c22314a270fe43669ec5bcb3dac1abd1"}}, "62b0b88a2901493ab46161b2bf495392": {"model_module": "@jupyter-widgets/controls", "model_name": "HTMLModel", "state": {"_view_name": "HTMLView", "style": "IPY_MODEL_46e8a13792354f70bed47c145f5eccd8", "_dom_classes": [], "description": "", "_model_name": "HTMLModel", "placeholder": "\u200b", "_view_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "value": " 221/221 [00:00<00:00, 1687.63it/s]", "_view_count": null, "_view_module_version": "1.5.0", "description_tooltip": null, "_model_module": "@jupyter-widgets/controls", "layout": "IPY_MODEL_b18d80cf221c498fa6a5497788dbba02"}}, "ecc567a207b14690814d5e677d40f182": {"model_module": "@jupyter-widgets/controls", "model_name": "ProgressStyleModel", "state": {"_view_name": "StyleView", "_model_name": "ProgressStyleModel", "description_width": "initial", "_view_module": "@jupyter-widgets/base", "_model_module_version": "1.5.0", "_view_count": null, "_view_module_version": "1.2.0", "bar_color": null, "_model_module": "@jupyter-widgets/controls"}}, "c22314a270fe43669ec5bcb3dac1abd1": {"model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "state": {"_view_name": "LayoutView", "grid_template_rows": null, "right": null, "justify_content": null, "_view_module": "@jupyter-widgets/base", "overflow": null, "_model_module_version": "1.2.0", "_view_count": null, "flex_flow": null, "width": null, "min_width": null, "border": null, "align_items": null, "bottom": null, "_model_module": "@jupyter-widgets/base", "top": null, "grid_column": null, "overflow_y": null, "overflow_x": null, "grid_auto_flow": null, "grid_area": null, "grid_template_columns": null, "flex": null, "_model_name": "LayoutModel", "justify_items": null, "grid_row": null, "max_height": null, "align_content": null, "visibility": null, "align_self": null, "height": null, "min_height": null, "padding": null, "grid_auto_rows": null, "grid_gap": null, "max_width": null, "order": null, "_view_module_version": "1.2.0", "grid_template_areas": null, "object_position": null, "object_fit": null, "grid_auto_columns": null, "margin": null, "display": null, "left": null}}, "46e8a13792354f70bed47c145f5eccd8": {"model_module": "@jupyter-widgets/controls", "model_name": "DescriptionStyleModel", "state": {"_view_name": "StyleView", "_model_name": "DescriptionStyleModel", "description_width": "", "_view_module": "@jupyter-widgets/base", "_model_module_version": "1.5.0", "_view_count": null, "_view_module_version": "1.2.0", "_model_module": "@jupyter-widgets/controls"}}, "b18d80cf221c498fa6a5497788dbba02": {"model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "state": {"_view_name": "LayoutView", "grid_template_rows": null, "right": null, "justify_content": null, "_view_module": "@jupyter-widgets/base", "overflow": null, "_model_module_version": "1.2.0", "_view_count": null, "flex_flow": null, "width": null, "min_width": null, "border": null, "align_items": null, "bottom": null, "_model_module": "@jupyter-widgets/base", "top": null, "grid_column": null, "overflow_y": null, "overflow_x": null, "grid_auto_flow": null, "grid_area": null, "grid_template_columns": null, "flex": null, "_model_name": "LayoutModel", "justify_items": null, "grid_row": null, "max_height": null, "align_content": null, "visibility": null, "align_self": null, "height": null, "min_height": null, "padding": null, "grid_auto_rows": null, "grid_gap": null, "max_width": null, "order": null, "_view_module_version": "1.2.0", "grid_template_areas": null, "object_position": null, "object_fit": null, "grid_auto_columns": null, "margin": null, "display": null, "left": null}}, "b9631737b6a540279c81cc78b76dfd71": {"model_module": "@jupyter-widgets/controls", "model_name": "HBoxModel", "state": {"_view_name": "HBoxView", "_dom_classes": [], "_model_name": "HBoxModel", "_view_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_view_count": null, "_view_module_version": "1.5.0", "box_style": "", "layout": "IPY_MODEL_c48ead07233c4533beae6c158bd5afbc", "_model_module": "@jupyter-widgets/controls", "children": ["IPY_MODEL_427c73b836c94b9c83c2b46ce30f6c2d", "IPY_MODEL_e657e15687b1482ea2135061ebe2b7fd"]}}, "c48ead07233c4533beae6c158bd5afbc": {"model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "state": {"_view_name": "LayoutView", "grid_template_rows": null, "right": null, "justify_content": null, "_view_module": "@jupyter-widgets/base", "overflow": null, "_model_module_version": "1.2.0", "_view_count": null, "flex_flow": null, "width": null, "min_width": null, "border": null, "align_items": null, "bottom": null, "_model_module": "@jupyter-widgets/base", "top": null, "grid_column": null, "overflow_y": null, "overflow_x": null, "grid_auto_flow": null, "grid_area": null, "grid_template_columns": null, "flex": null, "_model_name": "LayoutModel", "justify_items": null, "grid_row": null, "max_height": null, "align_content": null, "visibility": null, "align_self": null, "height": null, "min_height": null, "padding": null, "grid_auto_rows": null, "grid_gap": null, "max_width": null, "order": null, "_view_module_version": "1.2.0", "grid_template_areas": null, "object_position": null, "object_fit": null, "grid_auto_columns": null, "margin": null, "display": null, "left": null}}, "427c73b836c94b9c83c2b46ce30f6c2d": {"model_module": "@jupyter-widgets/controls", "model_name": "FloatProgressModel", "state": {"_view_name": "ProgressView", "style": "IPY_MODEL_5b0c206bf22a43bfb448060ebe07fd9d", "_dom_classes": [], "description": "Downloading: 100%", "_model_name": "FloatProgressModel", "bar_style": "success", "max": 498, "_view_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "value": 498, "_view_count": null, "_view_module_version": "1.5.0", "orientation": "horizontal", "min": 0, "description_tooltip": null, "_model_module": "@jupyter-widgets/controls", "layout": "IPY_MODEL_1a5e76b1a41c4064bc7da2bf24959700"}}, "e657e15687b1482ea2135061ebe2b7fd": {"model_module": "@jupyter-widgets/controls", "model_name": "HTMLModel", "state": {"_view_name": "HTMLView", "style": "IPY_MODEL_bbe077c87847482e9bbe09dbfb29e9d7", "_dom_classes": [], "description": "", "_model_name": "HTMLModel", "placeholder": "\u200b", "_view_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "value": " 498/498 [00:00<00:00, 5.06kB/s]", "_view_count": null, "_view_module_version": "1.5.0", "description_tooltip": null, "_model_module": "@jupyter-widgets/controls", "layout": "IPY_MODEL_36431709c7ac4b1cb2a35d3a1098a3a2"}}, "5b0c206bf22a43bfb448060ebe07fd9d": {"model_module": "@jupyter-widgets/controls", "model_name": "ProgressStyleModel", "state": {"_view_name": "StyleView", "_model_name": "ProgressStyleModel", "description_width": "initial", "_view_module": "@jupyter-widgets/base", "_model_module_version": "1.5.0", "_view_count": null, "_view_module_version": "1.2.0", "bar_color": null, "_model_module": "@jupyter-widgets/controls"}}, "1a5e76b1a41c4064bc7da2bf24959700": {"model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "state": {"_view_name": "LayoutView", "grid_template_rows": null, "right": null, "justify_content": null, "_view_module": "@jupyter-widgets/base", "overflow": null, "_model_module_version": "1.2.0", "_view_count": null, "flex_flow": null, "width": null, "min_width": null, "border": null, "align_items": null, "bottom": null, "_model_module": "@jupyter-widgets/base", "top": null, "grid_column": null, "overflow_y": null, "overflow_x": null, "grid_auto_flow": null, "grid_area": null, "grid_template_columns": null, "flex": null, "_model_name": "LayoutModel", "justify_items": null, "grid_row": null, "max_height": null, "align_content": null, "visibility": null, "align_self": null, "height": null, "min_height": null, "padding": null, "grid_auto_rows": null, "grid_gap": null, "max_width": null, "order": null, "_view_module_version": "1.2.0", "grid_template_areas": null, "object_position": null, "object_fit": null, "grid_auto_columns": null, "margin": null, "display": null, "left": null}}, "bbe077c87847482e9bbe09dbfb29e9d7": {"model_module": "@jupyter-widgets/controls", "model_name": "DescriptionStyleModel", "state": {"_view_name": "StyleView", "_model_name": "DescriptionStyleModel", "description_width": "", "_view_module": "@jupyter-widgets/base", "_model_module_version": "1.5.0", "_view_count": null, "_view_module_version": "1.2.0", "_model_module": "@jupyter-widgets/controls"}}, "36431709c7ac4b1cb2a35d3a1098a3a2": {"model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "state": {"_view_name": "LayoutView", "grid_template_rows": null, "right": null, "justify_content": null, "_view_module": "@jupyter-widgets/base", "overflow": null, "_model_module_version": "1.2.0", "_view_count": null, "flex_flow": null, "width": null, "min_width": null, "border": null, "align_items": null, "bottom": null, "_model_module": "@jupyter-widgets/base", "top": null, "grid_column": null, "overflow_y": null, "overflow_x": null, "grid_auto_flow": null, "grid_area": null, "grid_template_columns": null, "flex": null, "_model_name": "LayoutModel", "justify_items": null, "grid_row": null, "max_height": null, "align_content": null, "visibility": null, "align_self": null, "height": null, "min_height": null, "padding": null, "grid_auto_rows": null, "grid_gap": null, "max_width": null, "order": null, "_view_module_version": "1.2.0", "grid_template_areas": null, "object_position": null, "object_fit": null, "grid_auto_columns": null, "margin": null, "display": null, "left": null}}, "729d79ff96cf4688883dd1c9c1049f2e": {"model_module": "@jupyter-widgets/controls", "model_name": "HBoxModel", "state": {"_view_name": "HBoxView", "_dom_classes": [], "_model_name": "HBoxModel", "_view_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_view_count": null, "_view_module_version": "1.5.0", "box_style": "", "layout": "IPY_MODEL_05fe2a012ec04b2a94ba43fc6e4518e3", "_model_module": "@jupyter-widgets/controls", "children": ["IPY_MODEL_ff9a426de0a741d8acb77619a5165e94", "IPY_MODEL_aa326754c1b7474baeaa26297b8f6f52"]}}, "05fe2a012ec04b2a94ba43fc6e4518e3": {"model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "state": {"_view_name": "LayoutView", "grid_template_rows": null, "right": null, "justify_content": null, "_view_module": "@jupyter-widgets/base", "overflow": null, "_model_module_version": "1.2.0", "_view_count": null, "flex_flow": null, "width": null, "min_width": null, "border": null, "align_items": null, "bottom": null, "_model_module": "@jupyter-widgets/base", "top": null, "grid_column": null, "overflow_y": null, "overflow_x": null, "grid_auto_flow": null, "grid_area": null, "grid_template_columns": null, "flex": null, "_model_name": "LayoutModel", "justify_items": null, "grid_row": null, "max_height": null, "align_content": null, "visibility": null, "align_self": null, "height": null, "min_height": null, "padding": null, "grid_auto_rows": null, "grid_gap": null, "max_width": null, "order": null, "_view_module_version": "1.2.0", "grid_template_areas": null, "object_position": null, "object_fit": null, "grid_auto_columns": null, "margin": null, "display": null, "left": null}}, "ff9a426de0a741d8acb77619a5165e94": {"model_module": "@jupyter-widgets/controls", "model_name": "FloatProgressModel", "state": {"_view_name": "ProgressView", "style": "IPY_MODEL_2d16b90f3fef416ea1f14ed9d8b1dc66", "_dom_classes": [], "description": "Downloading: 100%", "_model_name": "FloatProgressModel", "bar_style": "success", "max": 898822, "_view_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "value": 898822, "_view_count": null, "_view_module_version": "1.5.0", "orientation": "horizontal", "min": 0, "description_tooltip": null, "_model_module": "@jupyter-widgets/controls", "layout": "IPY_MODEL_4aa997f4850646a8b958ef0e6a57e431"}}, "aa326754c1b7474baeaa26297b8f6f52": {"model_module": "@jupyter-widgets/controls", "model_name": "HTMLModel", "state": {"_view_name": "HTMLView", "style": "IPY_MODEL_e156ccff41ee49049f495991d19ea15a", "_dom_classes": [], "description": "", "_model_name": "HTMLModel", "placeholder": "\u200b", "_view_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "value": " 899k/899k [00:00<00:00, 2.20MB/s]", "_view_count": null, "_view_module_version": "1.5.0", "description_tooltip": null, "_model_module": "@jupyter-widgets/controls", "layout": "IPY_MODEL_e239fed6374d4a6b8c9f8eb89387985a"}}, "2d16b90f3fef416ea1f14ed9d8b1dc66": {"model_module": "@jupyter-widgets/controls", "model_name": "ProgressStyleModel", "state": {"_view_name": "StyleView", "_model_name": "ProgressStyleModel", "description_width": "initial", "_view_module": "@jupyter-widgets/base", "_model_module_version": "1.5.0", "_view_count": null, "_view_module_version": "1.2.0", "bar_color": null, "_model_module": "@jupyter-widgets/controls"}}, "4aa997f4850646a8b958ef0e6a57e431": {"model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "state": {"_view_name": "LayoutView", "grid_template_rows": null, "right": null, "justify_content": null, "_view_module": "@jupyter-widgets/base", "overflow": null, "_model_module_version": "1.2.0", "_view_count": null, "flex_flow": null, "width": null, "min_width": null, "border": null, "align_items": null, "bottom": null, "_model_module": "@jupyter-widgets/base", "top": null, "grid_column": null, "overflow_y": null, "overflow_x": null, "grid_auto_flow": null, "grid_area": null, "grid_template_columns": null, "flex": null, "_model_name": "LayoutModel", "justify_items": null, "grid_row": null, "max_height": null, "align_content": null, "visibility": null, "align_self": null, "height": null, "min_height": null, "padding": null, "grid_auto_rows": null, "grid_gap": null, "max_width": null, "order": null, "_view_module_version": "1.2.0", "grid_template_areas": null, "object_position": null, "object_fit": null, "grid_auto_columns": null, "margin": null, "display": null, "left": null}}, "e156ccff41ee49049f495991d19ea15a": {"model_module": "@jupyter-widgets/controls", "model_name": "DescriptionStyleModel", "state": {"_view_name": "StyleView", "_model_name": "DescriptionStyleModel", "description_width": "", "_view_module": "@jupyter-widgets/base", "_model_module_version": "1.5.0", "_view_count": null, "_view_module_version": "1.2.0", "_model_module": "@jupyter-widgets/controls"}}, "e239fed6374d4a6b8c9f8eb89387985a": {"model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "state": {"_view_name": "LayoutView", "grid_template_rows": null, "right": null, "justify_content": null, "_view_module": "@jupyter-widgets/base", "overflow": null, "_model_module_version": "1.2.0", "_view_count": null, "flex_flow": null, "width": null, "min_width": null, "border": null, "align_items": null, "bottom": null, "_model_module": "@jupyter-widgets/base", "top": null, "grid_column": null, "overflow_y": null, "overflow_x": null, "grid_auto_flow": null, "grid_area": null, "grid_template_columns": null, "flex": null, "_model_name": "LayoutModel", "justify_items": null, "grid_row": null, "max_height": null, "align_content": null, "visibility": null, "align_self": null, "height": null, "min_height": null, "padding": null, "grid_auto_rows": null, "grid_gap": null, "max_width": null, "order": null, "_view_module_version": "1.2.0", "grid_template_areas": null, "object_position": null, "object_fit": null, "grid_auto_columns": null, "margin": null, "display": null, "left": null}}, "bbeb816f10db4168852773dc2ac0df89": {"model_module": "@jupyter-widgets/controls", "model_name": "HBoxModel", "state": {"_view_name": "HBoxView", "_dom_classes": [], "_model_name": "HBoxModel", "_view_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_view_count": null, "_view_module_version": "1.5.0", "box_style": "", "layout": "IPY_MODEL_d439aa8a05164bfca299c371f30de5c4", "_model_module": "@jupyter-widgets/controls", "children": ["IPY_MODEL_8abfea12c4844dc29cb0158038012925", "IPY_MODEL_1398640d480a4053abf7f88fb9e3a7a6"]}}, "d439aa8a05164bfca299c371f30de5c4": {"model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "state": {"_view_name": "LayoutView", "grid_template_rows": null, "right": null, "justify_content": null, "_view_module": "@jupyter-widgets/base", "overflow": null, "_model_module_version": "1.2.0", "_view_count": null, "flex_flow": null, "width": null, "min_width": null, "border": null, "align_items": null, "bottom": null, "_model_module": "@jupyter-widgets/base", "top": null, "grid_column": null, "overflow_y": null, "overflow_x": null, "grid_auto_flow": null, "grid_area": null, "grid_template_columns": null, "flex": null, "_model_name": "LayoutModel", "justify_items": null, "grid_row": null, "max_height": null, "align_content": null, "visibility": null, "align_self": null, "height": null, "min_height": null, "padding": null, "grid_auto_rows": null, "grid_gap": null, "max_width": null, "order": null, "_view_module_version": "1.2.0", "grid_template_areas": null, "object_position": null, "object_fit": null, "grid_auto_columns": null, "margin": null, "display": null, "left": null}}, "8abfea12c4844dc29cb0158038012925": {"model_module": "@jupyter-widgets/controls", "model_name": "FloatProgressModel", "state": {"_view_name": "ProgressView", "style": "IPY_MODEL_63650038ce6f453cbf4ee5d093ab4948", "_dom_classes": [], "description": "Downloading: 100%", "_model_name": "FloatProgressModel", "bar_style": "success", "max": 456318, "_view_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "value": 456318, "_view_count": null, "_view_module_version": "1.5.0", "orientation": "horizontal", "min": 0, "description_tooltip": null, "_model_module": "@jupyter-widgets/controls", "layout": "IPY_MODEL_3badcaa03b3d4bfe828381b3a4428855"}}, "1398640d480a4053abf7f88fb9e3a7a6": {"model_module": "@jupyter-widgets/controls", "model_name": "HTMLModel", "state": {"_view_name": "HTMLView", "style": "IPY_MODEL_492c9f5aead44f70a98ffe9492abcd00", "_dom_classes": [], "description": "", "_model_name": "HTMLModel", "placeholder": "\u200b", "_view_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "value": " 456k/456k [00:00<00:00, 2.32MB/s]", "_view_count": null, "_view_module_version": "1.5.0", "description_tooltip": null, "_model_module": "@jupyter-widgets/controls", "layout": "IPY_MODEL_6ca4e136ecfe4360abedd73054ae7bb2"}}, "63650038ce6f453cbf4ee5d093ab4948": {"model_module": "@jupyter-widgets/controls", "model_name": "ProgressStyleModel", "state": {"_view_name": "StyleView", "_model_name": "ProgressStyleModel", "description_width": "initial", "_view_module": "@jupyter-widgets/base", "_model_module_version": "1.5.0", "_view_count": null, "_view_module_version": "1.2.0", "bar_color": null, "_model_module": "@jupyter-widgets/controls"}}, "3badcaa03b3d4bfe828381b3a4428855": {"model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "state": {"_view_name": "LayoutView", "grid_template_rows": null, "right": null, "justify_content": null, "_view_module": "@jupyter-widgets/base", "overflow": null, "_model_module_version": "1.2.0", "_view_count": null, "flex_flow": null, "width": null, "min_width": null, "border": null, "align_items": null, "bottom": null, "_model_module": "@jupyter-widgets/base", "top": null, "grid_column": null, "overflow_y": null, "overflow_x": null, "grid_auto_flow": null, "grid_area": null, "grid_template_columns": null, "flex": null, "_model_name": "LayoutModel", "justify_items": null, "grid_row": null, "max_height": null, "align_content": null, "visibility": null, "align_self": null, "height": null, "min_height": null, "padding": null, "grid_auto_rows": null, "grid_gap": null, "max_width": null, "order": null, "_view_module_version": "1.2.0", "grid_template_areas": null, "object_position": null, "object_fit": null, "grid_auto_columns": null, "margin": null, "display": null, "left": null}}, "492c9f5aead44f70a98ffe9492abcd00": {"model_module": "@jupyter-widgets/controls", "model_name": "DescriptionStyleModel", "state": {"_view_name": "StyleView", "_model_name": "DescriptionStyleModel", "description_width": "", "_view_module": "@jupyter-widgets/base", "_model_module_version": "1.5.0", "_view_count": null, "_view_module_version": "1.2.0", "_model_module": "@jupyter-widgets/controls"}}, "6ca4e136ecfe4360abedd73054ae7bb2": {"model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "state": {"_view_name": "LayoutView", "grid_template_rows": null, "right": null, "justify_content": null, "_view_module": "@jupyter-widgets/base", "overflow": null, "_model_module_version": "1.2.0", "_view_count": null, "flex_flow": null, "width": null, "min_width": null, "border": null, "align_items": null, "bottom": null, "_model_module": "@jupyter-widgets/base", "top": null, "grid_column": null, "overflow_y": null, "overflow_x": null, "grid_auto_flow": null, "grid_area": null, "grid_template_columns": null, "flex": null, "_model_name": "LayoutModel", "justify_items": null, "grid_row": null, "max_height": null, "align_content": null, "visibility": null, "align_self": null, "height": null, "min_height": null, "padding": null, "grid_auto_rows": null, "grid_gap": null, "max_width": null, "order": null, "_view_module_version": "1.2.0", "grid_template_areas": null, "object_position": null, "object_fit": null, "grid_auto_columns": null, "margin": null, "display": null, "left": null}}, "b5fbe640e22a46948e7855c10870a3f2": {"model_module": "@jupyter-widgets/controls", "model_name": "HBoxModel", "state": {"_view_name": "HBoxView", "_dom_classes": [], "_model_name": "HBoxModel", "_view_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_view_count": null, "_view_module_version": "1.5.0", "box_style": "", "layout": "IPY_MODEL_084398d996914a68ac917bfb525d4cd9", "_model_module": "@jupyter-widgets/controls", "children": ["IPY_MODEL_04de005ab4d44df8b06913d3119adebd", "IPY_MODEL_b2449d9d10964fa687de4b31836726e9"]}}, "084398d996914a68ac917bfb525d4cd9": {"model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "state": {"_view_name": "LayoutView", "grid_template_rows": null, "right": null, "justify_content": null, "_view_module": "@jupyter-widgets/base", "overflow": null, "_model_module_version": "1.2.0", "_view_count": null, "flex_flow": null, "width": null, "min_width": null, "border": null, "align_items": null, "bottom": null, "_model_module": "@jupyter-widgets/base", "top": null, "grid_column": null, "overflow_y": null, "overflow_x": null, "grid_auto_flow": null, "grid_area": null, "grid_template_columns": null, "flex": null, "_model_name": "LayoutModel", "justify_items": null, "grid_row": null, "max_height": null, "align_content": null, "visibility": null, "align_self": null, "height": null, "min_height": null, "padding": null, "grid_auto_rows": null, "grid_gap": null, "max_width": null, "order": null, "_view_module_version": "1.2.0", "grid_template_areas": null, "object_position": null, "object_fit": null, "grid_auto_columns": null, "margin": null, "display": null, "left": null}}, "04de005ab4d44df8b06913d3119adebd": {"model_module": "@jupyter-widgets/controls", "model_name": "FloatProgressModel", "state": {"_view_name": "ProgressView", "style": "IPY_MODEL_550e86e254914838be33eb4bf5d6091c", "_dom_classes": [], "description": "Downloading: 100%", "_model_name": "FloatProgressModel", "bar_style": "success", "max": 150, "_view_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "value": 150, "_view_count": null, "_view_module_version": "1.5.0", "orientation": "horizontal", "min": 0, "description_tooltip": null, "_model_module": "@jupyter-widgets/controls", "layout": "IPY_MODEL_9c5aaf17dd644f808bc202db261158b5"}}, "b2449d9d10964fa687de4b31836726e9": {"model_module": "@jupyter-widgets/controls", "model_name": "HTMLModel", "state": {"_view_name": "HTMLView", "style": "IPY_MODEL_f4bfc99507244ec893cd2e81e1fb1517", "_dom_classes": [], "description": "", "_model_name": "HTMLModel", "placeholder": "\u200b", "_view_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "value": " 150/150 [00:00<00:00, 1.58kB/s]", "_view_count": null, "_view_module_version": "1.5.0", "description_tooltip": null, "_model_module": "@jupyter-widgets/controls", "layout": "IPY_MODEL_0839beac94ff49439df6ff4bc127c974"}}, "550e86e254914838be33eb4bf5d6091c": {"model_module": "@jupyter-widgets/controls", "model_name": "ProgressStyleModel", "state": {"_view_name": "StyleView", "_model_name": "ProgressStyleModel", "description_width": "initial", "_view_module": "@jupyter-widgets/base", "_model_module_version": "1.5.0", "_view_count": null, "_view_module_version": "1.2.0", "bar_color": null, "_model_module": "@jupyter-widgets/controls"}}, "9c5aaf17dd644f808bc202db261158b5": {"model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "state": {"_view_name": "LayoutView", "grid_template_rows": null, "right": null, "justify_content": null, "_view_module": "@jupyter-widgets/base", "overflow": null, "_model_module_version": "1.2.0", "_view_count": null, "flex_flow": null, "width": null, "min_width": null, "border": null, "align_items": null, "bottom": null, "_model_module": "@jupyter-widgets/base", "top": null, "grid_column": null, "overflow_y": null, "overflow_x": null, "grid_auto_flow": null, "grid_area": null, "grid_template_columns": null, "flex": null, "_model_name": "LayoutModel", "justify_items": null, "grid_row": null, "max_height": null, "align_content": null, "visibility": null, "align_self": null, "height": null, "min_height": null, "padding": null, "grid_auto_rows": null, "grid_gap": null, "max_width": null, "order": null, "_view_module_version": "1.2.0", "grid_template_areas": null, "object_position": null, "object_fit": null, "grid_auto_columns": null, "margin": null, "display": null, "left": null}}, "f4bfc99507244ec893cd2e81e1fb1517": {"model_module": "@jupyter-widgets/controls", "model_name": "DescriptionStyleModel", "state": {"_view_name": "StyleView", "_model_name": "DescriptionStyleModel", "description_width": "", "_view_module": "@jupyter-widgets/base", "_model_module_version": "1.5.0", "_view_count": null, "_view_module_version": "1.2.0", "_model_module": "@jupyter-widgets/controls"}}, "0839beac94ff49439df6ff4bc127c974": {"model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "state": {"_view_name": "LayoutView", "grid_template_rows": null, "right": null, "justify_content": null, "_view_module": "@jupyter-widgets/base", "overflow": null, "_model_module_version": "1.2.0", "_view_count": null, "flex_flow": null, "width": null, "min_width": null, "border": null, "align_items": null, "bottom": null, "_model_module": "@jupyter-widgets/base", "top": null, "grid_column": null, "overflow_y": null, "overflow_x": null, "grid_auto_flow": null, "grid_area": null, "grid_template_columns": null, "flex": null, "_model_name": "LayoutModel", "justify_items": null, "grid_row": null, "max_height": null, "align_content": null, "visibility": null, "align_self": null, "height": null, "min_height": null, "padding": null, "grid_auto_rows": null, "grid_gap": null, "max_width": null, "order": null, "_view_module_version": "1.2.0", "grid_template_areas": null, "object_position": null, "object_fit": null, "grid_auto_columns": null, "margin": null, "display": null, "left": null}}, "c1168c1790f24e5f96868a35c5faddc2": {"model_module": "@jupyter-widgets/controls", "model_name": "HBoxModel", "state": {"_view_name": "HBoxView", "_dom_classes": [], "_model_name": "HBoxModel", "_view_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_view_count": null, "_view_module_version": "1.5.0", "box_style": "", "layout": "IPY_MODEL_f09b0dbbfe3a4d608b9287701f4a6920", "_model_module": "@jupyter-widgets/controls", "children": ["IPY_MODEL_53b2811b57b146fa9bafdf31c3a970fc", "IPY_MODEL_f60133fa3c8f4dedafb18c5fea2f8237"]}}, "f09b0dbbfe3a4d608b9287701f4a6920": {"model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "state": {"_view_name": "LayoutView", "grid_template_rows": null, "right": null, "justify_content": null, "_view_module": "@jupyter-widgets/base", "overflow": null, "_model_module_version": "1.2.0", "_view_count": null, "flex_flow": null, "width": null, "min_width": null, "border": null, "align_items": null, "bottom": null, "_model_module": "@jupyter-widgets/base", "top": null, "grid_column": null, "overflow_y": null, "overflow_x": null, "grid_auto_flow": null, "grid_area": null, "grid_template_columns": null, "flex": null, "_model_name": "LayoutModel", "justify_items": null, "grid_row": null, "max_height": null, "align_content": null, "visibility": null, "align_self": null, "height": null, "min_height": null, "padding": null, "grid_auto_rows": null, "grid_gap": null, "max_width": null, "order": null, "_view_module_version": "1.2.0", "grid_template_areas": null, "object_position": null, "object_fit": null, "grid_auto_columns": null, "margin": null, "display": null, "left": null}}, "53b2811b57b146fa9bafdf31c3a970fc": {"model_module": "@jupyter-widgets/controls", "model_name": "FloatProgressModel", "state": {"_view_name": "ProgressView", "style": "IPY_MODEL_fb0434cf2c054be3a435ab911b181bdb", "_dom_classes": [], "description": "Downloading: 100%", "_model_name": "FloatProgressModel", "bar_style": "success", "max": 25, "_view_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "value": 25, "_view_count": null, "_view_module_version": "1.5.0", "orientation": "horizontal", "min": 0, "description_tooltip": null, "_model_module": "@jupyter-widgets/controls", "layout": "IPY_MODEL_9dd437cdab68446e9bdeb0c823f80add"}}, "f60133fa3c8f4dedafb18c5fea2f8237": {"model_module": "@jupyter-widgets/controls", "model_name": "HTMLModel", "state": {"_view_name": "HTMLView", "style": "IPY_MODEL_13b3ef9bb5d44e20a97388632a544c20", "_dom_classes": [], "description": "", "_model_name": "HTMLModel", "placeholder": "\u200b", "_view_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "value": " 25.0/25.0 [00:00<00:00, 221B/s]", "_view_count": null, "_view_module_version": "1.5.0", "description_tooltip": null, "_model_module": "@jupyter-widgets/controls", "layout": "IPY_MODEL_45bbf7337d5d437696a46c290733d08f"}}, "fb0434cf2c054be3a435ab911b181bdb": {"model_module": "@jupyter-widgets/controls", "model_name": "ProgressStyleModel", "state": {"_view_name": "StyleView", "_model_name": "ProgressStyleModel", "description_width": "initial", "_view_module": "@jupyter-widgets/base", "_model_module_version": "1.5.0", "_view_count": null, "_view_module_version": "1.2.0", "bar_color": null, "_model_module": "@jupyter-widgets/controls"}}, "9dd437cdab68446e9bdeb0c823f80add": {"model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "state": {"_view_name": "LayoutView", "grid_template_rows": null, "right": null, "justify_content": null, "_view_module": "@jupyter-widgets/base", "overflow": null, "_model_module_version": "1.2.0", "_view_count": null, "flex_flow": null, "width": null, "min_width": null, "border": null, "align_items": null, "bottom": null, "_model_module": "@jupyter-widgets/base", "top": null, "grid_column": null, "overflow_y": null, "overflow_x": null, "grid_auto_flow": null, "grid_area": null, "grid_template_columns": null, "flex": null, "_model_name": "LayoutModel", "justify_items": null, "grid_row": null, "max_height": null, "align_content": null, "visibility": null, "align_self": null, "height": null, "min_height": null, "padding": null, "grid_auto_rows": null, "grid_gap": null, "max_width": null, "order": null, "_view_module_version": "1.2.0", "grid_template_areas": null, "object_position": null, "object_fit": null, "grid_auto_columns": null, "margin": null, "display": null, "left": null}}, "13b3ef9bb5d44e20a97388632a544c20": {"model_module": "@jupyter-widgets/controls", "model_name": "DescriptionStyleModel", "state": {"_view_name": "StyleView", "_model_name": "DescriptionStyleModel", "description_width": "", "_view_module": "@jupyter-widgets/base", "_model_module_version": "1.5.0", "_view_count": null, "_view_module_version": "1.2.0", "_model_module": "@jupyter-widgets/controls"}}, "45bbf7337d5d437696a46c290733d08f": {"model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "state": {"_view_name": "LayoutView", "grid_template_rows": null, "right": null, "justify_content": null, "_view_module": "@jupyter-widgets/base", "overflow": null, "_model_module_version": "1.2.0", "_view_count": null, "flex_flow": null, "width": null, "min_width": null, "border": null, "align_items": null, "bottom": null, "_model_module": "@jupyter-widgets/base", "top": null, "grid_column": null, "overflow_y": null, "overflow_x": null, "grid_auto_flow": null, "grid_area": null, "grid_template_columns": null, "flex": null, "_model_name": "LayoutModel", "justify_items": null, "grid_row": null, "max_height": null, "align_content": null, "visibility": null, "align_self": null, "height": null, "min_height": null, "padding": null, "grid_auto_rows": null, "grid_gap": null, "max_width": null, "order": null, "_view_module_version": "1.2.0", "grid_template_areas": null, "object_position": null, "object_fit": null, "grid_auto_columns": null, "margin": null, "display": null, "left": null}}, "6f8a5e3468d946ff9b081efe89dfeeaa": {"model_module": "@jupyter-widgets/controls", "model_name": "HBoxModel", "state": {"_view_name": "HBoxView", "_dom_classes": [], "_model_name": "HBoxModel", "_view_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_view_count": null, "_view_module_version": "1.5.0", "box_style": "", "layout": "IPY_MODEL_39ba681b1ed6459d92606fc654dcdba8", "_model_module": "@jupyter-widgets/controls", "children": ["IPY_MODEL_e529b42e3cbb4b4ba09acd901a0894f2", "IPY_MODEL_9edee620640c4e3e801b50db0c862d3c"]}}, "39ba681b1ed6459d92606fc654dcdba8": {"model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "state": {"_view_name": "LayoutView", "grid_template_rows": null, "right": null, "justify_content": null, "_view_module": "@jupyter-widgets/base", "overflow": null, "_model_module_version": "1.2.0", "_view_count": null, "flex_flow": null, "width": null, "min_width": null, "border": null, "align_items": null, "bottom": null, "_model_module": "@jupyter-widgets/base", "top": null, "grid_column": null, "overflow_y": null, "overflow_x": null, "grid_auto_flow": null, "grid_area": null, "grid_template_columns": null, "flex": null, "_model_name": "LayoutModel", "justify_items": null, "grid_row": null, "max_height": null, "align_content": null, "visibility": null, "align_self": null, "height": null, "min_height": null, "padding": null, "grid_auto_rows": null, "grid_gap": null, "max_width": null, "order": null, "_view_module_version": "1.2.0", "grid_template_areas": null, "object_position": null, "object_fit": null, "grid_auto_columns": null, "margin": null, "display": null, "left": null}}, "e529b42e3cbb4b4ba09acd901a0894f2": {"model_module": "@jupyter-widgets/controls", "model_name": "FloatProgressModel", "state": {"_view_name": "ProgressView", "style": "IPY_MODEL_eaedec158070463bbd78ef17de3171b9", "_dom_classes": [], "description": "100%", "_model_name": "FloatProgressModel", "bar_style": "success", "max": 10, "_view_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "value": 10, "_view_count": null, "_view_module_version": "1.5.0", "orientation": "horizontal", "min": 0, "description_tooltip": null, "_model_module": "@jupyter-widgets/controls", "layout": "IPY_MODEL_e8e02586085d44db937e917c1413b5dd"}}, "9edee620640c4e3e801b50db0c862d3c": {"model_module": "@jupyter-widgets/controls", "model_name": "HTMLModel", "state": {"_view_name": "HTMLView", "style": "IPY_MODEL_b63870f196d3457587b7dfe4616a61bb", "_dom_classes": [], "description": "", "_model_name": "HTMLModel", "placeholder": "\u200b", "_view_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "value": " 10/10 [00:03<00:00, 2.77it/s]", "_view_count": null, "_view_module_version": "1.5.0", "description_tooltip": null, "_model_module": "@jupyter-widgets/controls", "layout": "IPY_MODEL_8d170d9a18e64415a6398a78c5c0f04c"}}, "eaedec158070463bbd78ef17de3171b9": {"model_module": "@jupyter-widgets/controls", "model_name": "ProgressStyleModel", "state": {"_view_name": "StyleView", "_model_name": "ProgressStyleModel", "description_width": "initial", "_view_module": "@jupyter-widgets/base", "_model_module_version": "1.5.0", "_view_count": null, "_view_module_version": "1.2.0", "bar_color": null, "_model_module": "@jupyter-widgets/controls"}}, "e8e02586085d44db937e917c1413b5dd": {"model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "state": {"_view_name": "LayoutView", "grid_template_rows": null, "right": null, "justify_content": null, "_view_module": "@jupyter-widgets/base", "overflow": null, "_model_module_version": "1.2.0", "_view_count": null, "flex_flow": null, "width": null, "min_width": null, "border": null, "align_items": null, "bottom": null, "_model_module": "@jupyter-widgets/base", "top": null, "grid_column": null, "overflow_y": null, "overflow_x": null, "grid_auto_flow": null, "grid_area": null, "grid_template_columns": null, "flex": null, "_model_name": "LayoutModel", "justify_items": null, "grid_row": null, "max_height": null, "align_content": null, "visibility": null, "align_self": null, "height": null, "min_height": null, "padding": null, "grid_auto_rows": null, "grid_gap": null, "max_width": null, "order": null, "_view_module_version": "1.2.0", "grid_template_areas": null, "object_position": null, "object_fit": null, "grid_auto_columns": null, "margin": null, "display": null, "left": null}}, "b63870f196d3457587b7dfe4616a61bb": {"model_module": "@jupyter-widgets/controls", "model_name": "DescriptionStyleModel", "state": {"_view_name": "StyleView", "_model_name": "DescriptionStyleModel", "description_width": "", "_view_module": "@jupyter-widgets/base", "_model_module_version": "1.5.0", "_view_count": null, "_view_module_version": "1.2.0", "_model_module": "@jupyter-widgets/controls"}}, "8d170d9a18e64415a6398a78c5c0f04c": {"model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "state": {"_view_name": "LayoutView", "grid_template_rows": null, "right": null, "justify_content": null, "_view_module": "@jupyter-widgets/base", "overflow": null, "_model_module_version": "1.2.0", "_view_count": null, "flex_flow": null, "width": null, "min_width": null, "border": null, "align_items": null, "bottom": null, "_model_module": "@jupyter-widgets/base", "top": null, "grid_column": null, "overflow_y": null, "overflow_x": null, "grid_auto_flow": null, "grid_area": null, "grid_template_columns": null, "flex": null, "_model_name": "LayoutModel", "justify_items": null, "grid_row": null, "max_height": null, "align_content": null, "visibility": null, "align_self": null, "height": null, "min_height": null, "padding": null, "grid_auto_rows": null, "grid_gap": null, "max_width": null, "order": null, "_view_module_version": "1.2.0", "grid_template_areas": null, "object_position": null, "object_fit": null, "grid_auto_columns": null, "margin": null, "display": null, "left": null}}, "1dbf5adf34924620b231767d3e79bb6b": {"model_module": "@jupyter-widgets/controls", "model_name": "HBoxModel", "state": {"_view_name": "HBoxView", "_dom_classes": [], "_model_name": "HBoxModel", "_view_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_view_count": null, "_view_module_version": "1.5.0", "box_style": "", "layout": "IPY_MODEL_333382f4328f4db5ac6e413c0e351aa7", "_model_module": "@jupyter-widgets/controls", "children": ["IPY_MODEL_5eb687dae6f548329203b55b39dce3c1", "IPY_MODEL_26ba299915aa4ee397abcd561eb36cd2"]}}, "333382f4328f4db5ac6e413c0e351aa7": {"model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "state": {"_view_name": "LayoutView", "grid_template_rows": null, "right": null, "justify_content": null, "_view_module": "@jupyter-widgets/base", "overflow": null, "_model_module_version": "1.2.0", "_view_count": null, "flex_flow": null, "width": null, "min_width": null, "border": null, "align_items": null, "bottom": null, "_model_module": "@jupyter-widgets/base", "top": null, "grid_column": null, "overflow_y": null, "overflow_x": null, "grid_auto_flow": null, "grid_area": null, "grid_template_columns": null, "flex": null, "_model_name": "LayoutModel", "justify_items": null, "grid_row": null, "max_height": null, "align_content": null, "visibility": null, "align_self": null, "height": null, "min_height": null, "padding": null, "grid_auto_rows": null, "grid_gap": null, "max_width": null, "order": null, "_view_module_version": "1.2.0", "grid_template_areas": null, "object_position": null, "object_fit": null, "grid_auto_columns": null, "margin": null, "display": null, "left": null}}, "5eb687dae6f548329203b55b39dce3c1": {"model_module": "@jupyter-widgets/controls", "model_name": "FloatProgressModel", "state": {"_view_name": "ProgressView", "style": "IPY_MODEL_3783617f671643e0a06b0b96ecde4bd8", "_dom_classes": [], "description": "100%", "_model_name": "FloatProgressModel", "bar_style": "success", "max": 88, "_view_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "value": 88, "_view_count": null, "_view_module_version": "1.5.0", "orientation": "horizontal", "min": 0, "description_tooltip": null, "_model_module": "@jupyter-widgets/controls", "layout": "IPY_MODEL_8701d0581cca4b0095c67c5a99ec9b8c"}}, "26ba299915aa4ee397abcd561eb36cd2": {"model_module": "@jupyter-widgets/controls", "model_name": "HTMLModel", "state": {"_view_name": "HTMLView", "style": "IPY_MODEL_644a8d4ce6204e4da65d82c9a22f6834", "_dom_classes": [], "description": "", "_model_name": "HTMLModel", "placeholder": "\u200b", "_view_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "value": " 88/88 [00:33<00:00, 2.62it/s]", "_view_count": null, "_view_module_version": "1.5.0", "description_tooltip": null, "_model_module": "@jupyter-widgets/controls", "layout": "IPY_MODEL_eb55290567c54e64bd87bcccab451a20"}}, "3783617f671643e0a06b0b96ecde4bd8": {"model_module": "@jupyter-widgets/controls", "model_name": "ProgressStyleModel", "state": {"_view_name": "StyleView", "_model_name": "ProgressStyleModel", "description_width": "initial", "_view_module": "@jupyter-widgets/base", "_model_module_version": "1.5.0", "_view_count": null, "_view_module_version": "1.2.0", "bar_color": null, "_model_module": "@jupyter-widgets/controls"}}, "8701d0581cca4b0095c67c5a99ec9b8c": {"model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "state": {"_view_name": "LayoutView", "grid_template_rows": null, "right": null, "justify_content": null, "_view_module": "@jupyter-widgets/base", "overflow": null, "_model_module_version": "1.2.0", "_view_count": null, "flex_flow": null, "width": null, "min_width": null, "border": null, "align_items": null, "bottom": null, "_model_module": "@jupyter-widgets/base", "top": null, "grid_column": null, "overflow_y": null, "overflow_x": null, "grid_auto_flow": null, "grid_area": null, "grid_template_columns": null, "flex": null, "_model_name": "LayoutModel", "justify_items": null, "grid_row": null, "max_height": null, "align_content": null, "visibility": null, "align_self": null, "height": null, "min_height": null, "padding": null, "grid_auto_rows": null, "grid_gap": null, "max_width": null, "order": null, "_view_module_version": "1.2.0", "grid_template_areas": null, "object_position": null, "object_fit": null, "grid_auto_columns": null, "margin": null, "display": null, "left": null}}, "644a8d4ce6204e4da65d82c9a22f6834": {"model_module": "@jupyter-widgets/controls", "model_name": "DescriptionStyleModel", "state": {"_view_name": "StyleView", "_model_name": "DescriptionStyleModel", "description_width": "", "_view_module": "@jupyter-widgets/base", "_model_module_version": "1.5.0", "_view_count": null, "_view_module_version": "1.2.0", "_model_module": "@jupyter-widgets/controls"}}, "eb55290567c54e64bd87bcccab451a20": {"model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "state": {"_view_name": "LayoutView", "grid_template_rows": null, "right": null, "justify_content": null, "_view_module": "@jupyter-widgets/base", "overflow": null, "_model_module_version": "1.2.0", "_view_count": null, "flex_flow": null, "width": null, "min_width": null, "border": null, "align_items": null, "bottom": null, "_model_module": "@jupyter-widgets/base", "top": null, "grid_column": null, "overflow_y": null, "overflow_x": null, "grid_auto_flow": null, "grid_area": null, "grid_template_columns": null, "flex": null, "_model_name": "LayoutModel", "justify_items": null, "grid_row": null, "max_height": null, "align_content": null, "visibility": null, "align_self": null, "height": null, "min_height": null, "padding": null, "grid_auto_rows": null, "grid_gap": null, "max_width": null, "order": null, "_view_module_version": "1.2.0", "grid_template_areas": null, "object_position": null, "object_fit": null, "grid_auto_columns": null, "margin": null, "display": null, "left": null}}}
</script>
</code></p></div></div></div></div>Open-Dialog Chatbots for Learning New Languages [Part 1]2020-05-12T00:00:00-05:002020-05-12T00:00:00-05:00https://nathancooper.io/i-am-a-nerd/chatbot/deep-learning/gpt2/2020/05/12/chatbot-part-1<!--
#################################################
### THIS FILE WAS AUTOGENERATED! DO NOT EDIT! ###
#################################################
# file to edit: _notebooks/2020-05-12-chatbot-part-1.ipynb
-->
<div class="container" id="notebook-container">
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<blockquote><p>Image by <a href="https://pixabay.com/users/mohamed_hassan-5229782/?utm_source=link-attribution&utm_medium=referral&utm_campaign=image&utm_content=3589528">mohamed Hassan</a> from <a href="https://pixabay.com/?utm_source=link-attribution&utm_medium=referral&utm_campaign=image&utm_content=3589528">Pixabay</a></p>
</blockquote>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h2 id="This-notebook-was-adapted-from-the-following-project:">This notebook was adapted from the following project:<a class="anchor-link" href="#This-notebook-was-adapted-from-the-following-project:"> </a></h2><ol>
<li><a href="https://github.com/huggingface/transformers/blob/master/examples/run_language_modeling.py">https://github.com/huggingface/transformers/blob/master/examples/run_language_modeling.py</a></li>
</ol>
<p>Original license of the project this notebook was adapted from: <a href="https://github.com/huggingface/transformers/blob/master/examples/run_language_modeling.py">https://github.com/huggingface/transformers/blob/master/examples/run_language_modeling.py</a></p>
<h3 id="LICENSE">LICENSE<a class="anchor-link" href="#LICENSE"> </a></h3>
<pre><code># Copyright 2018 The Google AI Language Team Authors and The HuggingFace Inc. team.
# Copyright (c) 2018, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.</code></pre>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h1 id="About">About<a class="anchor-link" href="#About"> </a></h1><p>Hola! Today we will be creating a chatbot, but not just any chatbot. In this tutorial, you will create your own open-dialog chatbot, one that doesn't just have premade responses to very specific questions or commands!</p>
<p>The overall goal of this tutorial is to create a language learning companion where you can practice simple conversations in a language you care about. We will focus on the beautiful Spanish language in this series as I have been trying to learn the language for the past 5 years, however, you should be able to adapt this tutorial to other languages as well.</p>
<p>First we are going to cover some of the background material for how all this works (if you are already familiar with the GPT2 model, go ahead and skip this background section). Let's get to it!</p>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h1 id="Background">Background<a class="anchor-link" href="#Background"> </a></h1>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h2 id="What-is-GPT2?">What is GPT2?<a class="anchor-link" href="#What-is-GPT2?"> </a></h2><p>In this post, we are going to use the GPT2 model (Generative Pre-Training 2), from the amazing paper <a href="https://openai.com/blog/better-language-models/">"Language Models are Unsupervised Multitask Learners"</a> by Alex Radford et. al. I will be giving a brief overview of this model. However, if you want a more in-depth explanation I highly recommend the blog post <a href="http://jalammar.github.io/illustrated-gpt2/">"The Illustrated GPT-2"</a> by Jay Alammar.</p>
<p>GPT2 is what is called an autoregressive language model. This may sound complicated, but it is actually quiet simple, so lets break down what this means. Autoregressive means that the output of the model is fedback into the model as input. Here is a nice example of how that works:</p>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p><img src="https://github.com/ncoop57/i-am-a-nerd/blob/master/images/autoregressive.gif?raw=1" alt="" /></p>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<blockquote><p>Image From <a href="https://deepmind.com/blog/article/wavenet-generative-model-raw-audio">Deepmind</a></p>
</blockquote>
<p>Now, a language model is usually some statistical model that gives the probability of some word given the context. So, take the following example:</p>
<blockquote><p><em>An [blank] a day keeps the doctor away</em></p>
</blockquote>
<p>A good language model would give higher probability to the word "apple" occuring in the [blank] than say the word "crocodile" since most likely encountering a crocodile daily would probably have the opposite effect.</p>
<p>Putting them together, we get an autoregressive language model where given some context</p>
<blockquote><p><em>How much wood could a woodchuck chuck, if a woodchuck could [blank]</em></p>
</blockquote>
<p>The statistical model then gives some probability to what the next word will be, which we will use in selecting the word. Once we have the selection we add it to our sentence and repeat the whole process again!</p>
<blockquote><p><em>How much wood could a woodchuck chuck, if a woodchuck could chuck [blank]</em></p>
</blockquote>
<p>Now, to train our autoregressive language model we just need to get a bunch of example sentences or just chunks of text, hide the last word, and use these sentences with the missing word as our inputs and the last words as the target. This is essentially the whole idea behind GPT2 and many other autoregressive language models, where they learn how language works by using the context to infer the next word.</p>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h2 id="GPT2-as-a-chatbot">GPT2 as a chatbot<a class="anchor-link" href="#GPT2-as-a-chatbot"> </a></h2><p>Great, so you may be asking yourself, "how do we use GPT2 as a chatbot?" To answer this question we need to turn our attention to another paper, <a href="https://arxiv.org/abs/1911.00536">"DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation"</a>. To see how we can repurpose this generator, GPT2, look at the following example:</p>
<blockquote><p><em>Hi, how are you? [end_of_turn] I'm good, what about you? [end_of_turn] Not so good, lots of long nights at work. [end_of_turn] Darn, that sucks :( [end_of_conversation]</em></p>
</blockquote>
<p>This is a sample conversation between two speakers. What's special about it is that there are special tokens that signify when one of the speakers has finished talking, which we in the biz call a turn. If we treat this example like our previous one with the autorgressive language mode, we can do some interesting things:</p>
<blockquote><p><em>Hi, how are you? [end_of_turn] [blank]</em></p>
</blockquote>
<p>If we use the same logic as we did previously, it is easy to see how we can now use GPT2 to guess the next word in this conversation.</p>
<blockquote><p><em>Hi, how are you? [end_of_turn] I'm [blank]</em></p>
</blockquote>
<p>We keep feeding back the prediction of our model and there ya have it! A chatting GPT2, where all we need to do is show the model a bunch of these example conversations and have it predict the next word in the conversation.</p>
<p>I think that is plenty of background, we will revisit exactly how we design a system where we actually hold a conversation with GPT2 once we have the model trained ;).</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="o">!</span> pip -q install <span class="nv">transformers</span><span class="o">==</span><span class="m">2</span>.9.0 gdown
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_subarea output_stream output_stdout output_text">
<pre> |████████████████████████████████| 645kB 5.5MB/s
|████████████████████████████████| 1.0MB 44.6MB/s
|████████████████████████████████| 890kB 45.8MB/s
|████████████████████████████████| 3.8MB 34.9MB/s
Building wheel for sacremoses (setup.py) ... done
</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>Let's define to configuration variables so we don't have a bunch of magic numbers and strings!</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<details class="description">
<summary class="btn btn-sm" data-open="Hide Code" data-close="Show Code"></summary>
<p><div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="c1"># Args to allow for easy convertion of python script to notebook</span>
<span class="k">class</span> <span class="nc">Args</span><span class="p">():</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">output_dir</span> <span class="o">=</span> <span class="s1">'output'</span>
<span class="bp">self</span><span class="o">.</span><span class="n">model_type</span> <span class="o">=</span> <span class="s1">'gpt2'</span>
<span class="bp">self</span><span class="o">.</span><span class="n">model_name_or_path</span> <span class="o">=</span> <span class="s1">'microsoft/DialoGPT-small'</span>
<span class="bp">self</span><span class="o">.</span><span class="n">config_name</span> <span class="o">=</span> <span class="s1">'microsoft/DialoGPT-small'</span>
<span class="bp">self</span><span class="o">.</span><span class="n">tokenizer_name</span> <span class="o">=</span> <span class="s1">'microsoft/DialoGPT-small'</span>
<span class="bp">self</span><span class="o">.</span><span class="n">cache_dir</span> <span class="o">=</span> <span class="s1">'cached'</span>
<span class="bp">self</span><span class="o">.</span><span class="n">block_size</span> <span class="o">=</span> <span class="mi">512</span>
<span class="bp">self</span><span class="o">.</span><span class="n">do_train</span> <span class="o">=</span> <span class="kc">True</span>
<span class="bp">self</span><span class="o">.</span><span class="n">do_eval</span> <span class="o">=</span> <span class="kc">True</span>
<span class="bp">self</span><span class="o">.</span><span class="n">evaluate_during_training</span> <span class="o">=</span> <span class="kc">False</span>
<span class="bp">self</span><span class="o">.</span><span class="n">per_gpu_train_batch_size</span> <span class="o">=</span> <span class="mi">4</span>
<span class="bp">self</span><span class="o">.</span><span class="n">per_gpu_eval_batch_size</span> <span class="o">=</span> <span class="mi">4</span>
<span class="bp">self</span><span class="o">.</span><span class="n">gradient_accumulation_steps</span> <span class="o">=</span> <span class="mi">1</span>
<span class="bp">self</span><span class="o">.</span><span class="n">learning_rate</span> <span class="o">=</span> <span class="mf">5e-5</span>
<span class="bp">self</span><span class="o">.</span><span class="n">weight_decay</span> <span class="o">=</span> <span class="mf">0.0</span>
<span class="bp">self</span><span class="o">.</span><span class="n">adam_epsilon</span> <span class="o">=</span> <span class="mf">1e-8</span>
<span class="bp">self</span><span class="o">.</span><span class="n">max_grad_norm</span> <span class="o">=</span> <span class="mf">1.0</span>
<span class="bp">self</span><span class="o">.</span><span class="n">num_train_epochs</span> <span class="o">=</span> <span class="mi">3</span>
<span class="bp">self</span><span class="o">.</span><span class="n">max_steps</span> <span class="o">=</span> <span class="o">-</span><span class="mi">1</span>
<span class="bp">self</span><span class="o">.</span><span class="n">warmup_steps</span> <span class="o">=</span> <span class="mi">0</span>
<span class="bp">self</span><span class="o">.</span><span class="n">logging_steps</span> <span class="o">=</span> <span class="mi">1000</span>
<span class="bp">self</span><span class="o">.</span><span class="n">save_steps</span> <span class="o">=</span> <span class="mi">3500</span>
<span class="bp">self</span><span class="o">.</span><span class="n">save_total_limit</span> <span class="o">=</span> <span class="kc">None</span>
<span class="bp">self</span><span class="o">.</span><span class="n">eval_all_checkpoints</span> <span class="o">=</span> <span class="kc">False</span>
<span class="bp">self</span><span class="o">.</span><span class="n">no_cuda</span> <span class="o">=</span> <span class="kc">False</span>
<span class="bp">self</span><span class="o">.</span><span class="n">overwrite_output_dir</span> <span class="o">=</span> <span class="kc">True</span>
<span class="bp">self</span><span class="o">.</span><span class="n">overwrite_cache</span> <span class="o">=</span> <span class="kc">True</span>
<span class="bp">self</span><span class="o">.</span><span class="n">should_continue</span> <span class="o">=</span> <span class="kc">False</span>
<span class="bp">self</span><span class="o">.</span><span class="n">seed</span> <span class="o">=</span> <span class="mi">42</span>
<span class="bp">self</span><span class="o">.</span><span class="n">local_rank</span> <span class="o">=</span> <span class="o">-</span><span class="mi">1</span>
<span class="bp">self</span><span class="o">.</span><span class="n">fp16</span> <span class="o">=</span> <span class="kc">False</span>
<span class="bp">self</span><span class="o">.</span><span class="n">fp16_opt_level</span> <span class="o">=</span> <span class="s1">'O1'</span>
<span class="n">args</span> <span class="o">=</span> <span class="n">Args</span><span class="p">()</span>
</pre></div>
</div>
</div>
</div>
</p>
</details>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h1 id="The-Data!">The Data!<a class="anchor-link" href="#The-Data!"> </a></h1><p>To train our chatbot we will be using conversations scraped from subtitles of Spanish TV shows and movies. I've gone ahead and formated the data for us already, however, if you would like to use a different language to train your chatbot you can use <a href="https://colab.research.google.com/drive/1kKErlSSpewQbWexFPEj1rPWsYpMx69ZS?usp=sharing">this script</a> to generate a csv with the same format I am going to use in the rest of this tutorial.</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="o">!</span> gdown https://drive.google.com/uc?id<span class="o">=</span>1Lp-diuMohUTGyB9BSTFgeGZyY3dkNuEg
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_subarea output_stream output_stdout output_text">
<pre>Downloading...
From: https://drive.google.com/uc?id=1Lp-diuMohUTGyB9BSTFgeGZyY3dkNuEg
To: /content/final_es_conv.csv
20.3MB [00:00, 55.6MB/s]
</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="s1">'final_es_conv.csv'</span><span class="p">)</span>
<span class="n">df</span> <span class="o">=</span> <span class="n">df</span><span class="o">.</span><span class="n">dropna</span><span class="p">()</span>
<span class="n">trn_df</span><span class="p">,</span> <span class="n">val_df</span> <span class="o">=</span> <span class="n">train_test_split</span><span class="p">(</span><span class="n">df</span><span class="p">,</span> <span class="n">test_size</span> <span class="o">=</span> <span class="mf">0.2</span><span class="p">)</span>
<span class="n">trn_df</span><span class="o">.</span><span class="n">head</span><span class="p">()</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_html rendered_html output_subarea output_execute_result">
<div>
<style scoped="">
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>response</th>
<th>context</th>
<th>context/0</th>
<th>context/1</th>
<th>context/2</th>
<th>context/3</th>
<th>context/4</th>
<th>context/5</th>
<th>context/6</th>
<th>context/7</th>
<th>context/8</th>
<th>context/9</th>
</tr>
</thead>
<tbody>
<tr>
<th>36917</th>
<td>Es tan simple.</td>
<td>No se que te detiene.</td>
<td>¡Muy persuasiva!</td>
<td>No es crimen librarse de una alimaña.</td>
<td>¿Por qué tener lastima de un hombre tan vil?</td>
<td>Además, también ha puesto sus ojos en Okayo.</td>
<td>Hace 4 años que soy victima de mi marido.</td>
<td>Solo estoy siendo franca contigo.</td>
<td>Calmate.</td>
<td>¡Eres peor que el diablo!</td>
<td>¿Comprendes?</td>
<td>Okayo me recuerda constantemente mi fracaso.</td>
</tr>
<tr>
<th>5449</th>
<td>Muy torpe, Joyce.</td>
<td>A la sala de interrogación rápido. ¡Muévanse!</td>
<td>De pie, muchachos.</td>
<td>A la sala de interrogación rápido. ¡Muévanse!</td>
<td>De pie, muchachos.</td>
<td>¡Use su cuchillo, hombre!</td>
<td>¡Adelántese, Thomson!</td>
<td>¡Bien hecho, Jenkins!</td>
<td>Gracias.</td>
<td>Muy bien.</td>
<td>El bungaló del mayor Warden está al final del ...</td>
<td>Continúe, conductor.</td>
</tr>
<tr>
<th>37004</th>
<td>Pídemelo.</td>
<td>Sólo lo que quieras tú.</td>
<td>Ya no soy yo.</td>
<td>Eres preciosa y maravillosa.</td>
<td>¿No?</td>
<td>Así te gustaré.</td>
<td>Haré y diré lo que quieras.</td>
<td>Nunca.</td>
<td>Así nunca querrás estar con otras, ¿verdad?</td>
<td>Siempre diré lo que tú desees y haré lo que tú...</td>
<td>Pero yo sí.</td>
<td>Pero...</td>
</tr>
<tr>
<th>47077</th>
<td>¡Boris!</td>
<td>¡Nicolás, que alegría a mi corazón, volviste!</td>
<td>¡Regresan los Vencedores!</td>
<td>¡Miren!</td>
<td>¡Ahí vienen!</td>
<td>Está vivo.</td>
<td>Boris está vivo.</td>
<td>Dasha prometió avisarme cuando regrese.</td>
<td>Pero, en la fábrica dicen que él está en una u...</td>
<td>Tampoco hay noticias de Stepan.</td>
<td>¡Quién sabe!</td>
<td>¿Por qué entonces, no hay noticias de él?</td>
</tr>
<tr>
<th>41450</th>
<td>Entonces por qué no estamos en mejor situación...</td>
<td>Dora Hartley era una buena prueba.</td>
<td>Mire, lo que hace usted creer ¿Qué los indios ...</td>
<td>Aleja esa arma.</td>
<td>Buenas noches.</td>
<td>Es hora de ir a la cama.</td>
<td>Seguro.</td>
<td>Sí. recuerde que es un secreto.</td>
<td>Es bonita.</td>
<td>Está bien.</td>
<td>¿Ann Martin?</td>
<td>Hola, Bax.</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="nb">len</span><span class="p">(</span><span class="n">trn_df</span><span class="p">),</span> <span class="nb">len</span><span class="p">(</span><span class="n">val_df</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_text output_subarea output_execute_result">
<pre>(40374, 10094)</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<details class="description">
<summary class="btn btn-sm" data-open="Hide Code" data-close="Show Code"></summary>
<p><div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="k">def</span> <span class="nf">get_counter_and_lens</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">tokenizer</span><span class="p">):</span>
<span class="n">flatten</span> <span class="o">=</span> <span class="k">lambda</span> <span class="n">l</span><span class="p">:</span> <span class="p">[</span><span class="n">item</span> <span class="k">for</span> <span class="n">sublist</span> <span class="ow">in</span> <span class="n">l</span> <span class="k">for</span> <span class="n">item</span> <span class="ow">in</span> <span class="n">sublist</span><span class="p">]</span>
<span class="n">toks</span> <span class="o">=</span> <span class="p">[</span><span class="n">tokenizer</span><span class="o">.</span><span class="n">tokenize</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">data</span><span class="p">]</span>
<span class="k">return</span> <span class="nb">list</span><span class="p">(</span><span class="nb">map</span><span class="p">(</span><span class="nb">len</span><span class="p">,</span> <span class="n">toks</span><span class="p">)),</span> <span class="n">Counter</span><span class="p">(</span><span class="n">flatten</span><span class="p">(</span><span class="n">toks</span><span class="p">)),</span> <span class="n">Counter</span><span class="p">(</span><span class="s1">' '</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">data</span><span class="p">)</span><span class="o">.</span><span class="n">split</span><span class="p">())</span>
</pre></div>
</div>
</div>
</div>
</p>
</details>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">tokenizer</span> <span class="o">=</span> <span class="n">AutoTokenizer</span><span class="o">.</span><span class="n">from_pretrained</span><span class="p">(</span><span class="n">args</span><span class="o">.</span><span class="n">model_name_or_path</span><span class="p">,</span> <span class="n">cache_dir</span><span class="o">=</span><span class="n">args</span><span class="o">.</span><span class="n">cache_dir</span><span class="p">)</span>
<span class="n">lens</span><span class="p">,</span> <span class="n">tok_cnt</span><span class="p">,</span> <span class="n">word_cnt</span> <span class="o">=</span> <span class="n">get_counter_and_lens</span><span class="p">(</span><span class="n">trn_df</span><span class="p">[</span><span class="n">df</span><span class="o">.</span><span class="n">columns</span><span class="p">]</span><span class="o">.</span><span class="n">apply</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="s1">' '</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">x</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="nb">str</span><span class="p">)),</span> <span class="n">axis</span> <span class="o">=</span> <span class="mi">1</span><span class="p">),</span> <span class="n">tokenizer</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_subarea output_stream output_stdout output_text">
<pre>
</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<details class="description">
<summary class="btn btn-sm" data-open="Hide Code" data-close="Show Code"></summary>
<p><div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="k">def</span> <span class="nf">plot_counts</span><span class="p">(</span><span class="n">counts</span><span class="p">,</span> <span class="n">top_k</span> <span class="o">=</span> <span class="mi">30</span><span class="p">):</span>
<span class="n">labels</span><span class="p">,</span> <span class="n">values</span> <span class="o">=</span> <span class="nb">zip</span><span class="p">(</span><span class="o">*</span><span class="n">counts</span><span class="o">.</span><span class="n">most_common</span><span class="p">()[:</span><span class="n">top_k</span><span class="p">])</span>
<span class="n">indexes</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">arange</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">labels</span><span class="p">))</span>
<span class="n">width</span> <span class="o">=</span> <span class="mi">1</span>
<span class="n">plt</span><span class="o">.</span><span class="n">figure</span><span class="p">(</span><span class="n">num</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">22</span><span class="p">,</span> <span class="mi">4</span><span class="p">),</span> <span class="n">dpi</span><span class="o">=</span><span class="mi">60</span><span class="p">,</span> <span class="n">facecolor</span><span class="o">=</span><span class="s1">'w'</span><span class="p">,</span> <span class="n">edgecolor</span><span class="o">=</span><span class="s1">'k'</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">bar</span><span class="p">(</span><span class="n">indexes</span><span class="p">,</span> <span class="n">values</span><span class="p">,</span> <span class="n">width</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">xticks</span><span class="p">(</span><span class="n">indexes</span> <span class="o">+</span> <span class="n">width</span> <span class="o">*</span> <span class="mf">0.5</span><span class="p">,</span> <span class="n">labels</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
</pre></div>
</div>
</div>
</div>
</p>
</details>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">plot_counts</span><span class="p">(</span><span class="n">tok_cnt</span><span class="p">,</span> <span class="n">top_k</span> <span class="o">=</span> <span class="mi">30</span><span class="p">)</span>
<span class="n">plot_counts</span><span class="p">(</span><span class="n">word_cnt</span><span class="p">,</span> <span class="n">top_k</span> <span class="o">=</span> <span class="mi">30</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_png output_subarea ">
<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAABDAAAADRCAYAAAA6w0IiAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAAJOgAACToB8GSSSgAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+j8jraAAAgAElEQVR4nO3de1iVZb7/8TcgeAwULaPQ3Htkwqsry9JKEEFNY+OhMqGD4ribDtpMkzM5ObXLUvLKrMumPdPY3qnplB3AjlsbD4R4ALdmc2lZ2ZgzFmInBwMzJwT5/dEPRtlCkQvW0t6vv+Th4f7e93Nyrc+6n2eF1dTU1CBJkiRJkhTCwoPdAUmSJEmSpG9jgCFJkiRJkkKeAYYkSZIkSQp5BhiSJEmSJCnkGWBIkiRJkqSQ16qxX3766adceeWVREZGEhERweLFi7n22muprq4mIiKCn/70p2RnZ/PJJ58wfvx4Dhw4wKRJkxg3bhzV1dXceOON7NixgwsvvJDf/va3ADz66KPk5ubSuXNnnn76aaKjo1m/fj133HEH4eHhzJ07l3PPPfeY/bnwwgv50Y9+FPitIEmSJEmSQsLOnTt58803/8/ysMa+RrW6upqwsDDCw8NZuHAhu3fvJj8/n6VLl9KhQ4e69X75y18yfPhw0tLSSElJYfXq1axYsYI33niD+++/nxtvvJHrr7+ehIQErr76avLz83nmmWf46KOPuPPOO0lNTeXll19m//79TJw4kddee+2Y/cnKyiI3NzcAm0OSJEmSJIWiht77N3oLSUREBOHh36yyf/9+zjnnHMLDw8nIyGDUqFF8+OGHAGzatInBgwfTqlUr+vbty7Zt2yguLmbYsGEApKenU1RUxBtvvEFqaiphYWF1yw4ePEhERASdOnWie/fulJWVBXrskiRJkiTpBNfoLSQAW7Zs4eabb+aLL75g5cqV5OXl0blzZ9asWcOtt97Kq6++yqFDh+qCjpiYGMrKyti3bx/R0dFNWgbQqlUrKisriYqKAiAvL4+8vDwASkpKAjt6SZIkSZJ0QvjWh3ief/75bNy4kZycHB544AE6d+4MQGpqKnv27AEgMjKSw4cPA1BeXk5sbCwdO3akoqKiScsAqqqq6sILgMzMTHJzc8nNzaVbt24BGrYkSZIkSTqRNBpgVFZW1v07JiaGdu3a1YUN7777Lp06dQKgX79+FBYWUlVVxZtvvsk555xDUlIS+fn5AKxYsYLk5GT69evH2rVrj1rWrl07qqqq+OKLLygpKSE2NrZZBipJkiRJkk5cjd5CsmXLFqZMmUJERARt2rRhwYIFDB48mLZt2wLw2GOPATB16lTGjx/P3XffzcSJE2nbti0jRozg5ZdfJiUlhT59+tC/f38Ahg8fTnJyMp06dWLx4sUA3H///WRkZBAWFsYf/vCH5hyvJEmSJEk6ATX6LSShxm8hkSRJkiTp5Pa9voVEkiRJkiQpFHzrt5AoMHr8ZlnQau+aNTxotSVJkiRJCgRnYEiSJEmSpJBngCFJkiRJkkKeAYYkSZIkSQp5BhiSJEmSJCnkGWBIkiRJkqSQZ4AhSZIkSZJCngGGJEmSJEkKeQYYkiRJkiQp5BlgSJIkSZKkkGeAIUmSJEmSQp4BhiRJkiRJCnkGGJIkSZIkKeQZYEiSJEmSpJBngCFJkiRJkkKeAYYkSZIkSQp5jQYYn376KUlJSaSmpjJ48GA+/vhj1q9fT1JSEgMGDODtt98G4JNPPmHYsGEkJyfz9NNPA1BdXc31119PSkoKkydPrmvz0UcfJTk5mVGjRlFRUQFwzDYlSZIkSZJqNRpgdOnShfXr17NmzRrGjx/P/Pnz+Y//+A+WLVvGM888w9SpUwF48MEHueOOO1izZg2PPfYY//jHP1i6dClnnHEG69at48CBA2zYsIG9e/fy6quvsn79eq6++moee+wxgGO2KUmSJEmSVKvRACMiIoLw8G9W2b9/Pz/60Y+IiIigU6dOdO/enbKyMgA2bdrE4MGDadWqFX379mXbtm0UFxczbNgwANLT0ykqKuKNN94gNTWVsLCwumUHDx48ZpuSJEmSJEm1vvUZGFu2bOHiiy/m97//PUlJSURHR9f9rlWrVlRWVnLo0KG6oCMmJoaysjL27dtXt+53XXZkm5IkSZIkSbVafdsK559/Phs3biQ3N5eZM2fWPbcCoKqqiqioKCIjIzl8+DDh4eGUl5cTGxtLx44d69Y9ctkHH3zwf5Ydq81aeXl55OXlAVBSUhKYUUuSJEmSpBNKozMwjpwJERMTQ4cOHaiqquKLL76gpKSE2NhYAPr160dhYSFVVVW8+eabnHPOOSQlJZGfnw/AihUrSE5Opl+/fqxdu/aoZe3atTtmm7UyMzPJzc0lNzeXbt26BXTwkiRJkiTpxNDoDIwtW7YwZcoUIiIiaNOmDQsWLGDHjh1kZGQQFhbGH/7wBwCmTp3K+PHjufvuu5k4cSJt27ZlxIgRvPzyy6SkpNCnTx/69+8PwPDhw0lOTqZTp04sXrwYgPvvv///tClJkiRJklQrrKampibYnfiusrKyyM3NDXY3vpcev1kWtNq7Zg0PWm1JkiRJkpqioff+3/oQT0mSJEmSpGAzwJAkSZIkSSHPAEOSJEmSJIU8AwxJkiRJkhTyDDAkSZIkSVLIM8CQJEmSJEkhzwBDkiRJkiSFPAMMSZIkSZIU8gwwJEmSJElSyDPAkCRJkiRJIc8AQ5IkSZIkhTwDDEmSJEmSFPIMMCRJkiRJUsgzwJAkSZIkSSHPAEOSJEmSJIU8AwxJkiRJkhTyDDAkSZIkSVLIM8CQJEmSJEkhr9EAY9OmTfTv35+BAwdy7bXXcujQIRISEkhLSyMtLY1Vq1YBsH37dgYOHEhSUhKvv/46AAcOHGD06NEMGDCA2bNn17U5depUUlJSyM7O5tChQwDk5eWRlJTEkCFD2L17d3ONVZIkSZIknaAaDTC6detGQUEBa9eupUePHrzyyivExMRQWFhIYWEhQ4cOBeCuu+5i/vz5LF++nGnTpgEwb948MjIyWL9+PQUFBZSWlrJ161ZKS0tZt24diYmJLFmyhKqqKubMmUNhYSEzZswgJyen+UctSZIkSZJOKI0GGHFxcbRt2xaAqKgowsPD+fLLL0lNTeW6666jrKwMgD179pCQkEB0dDSxsbHs3buX4uJihg0bBsDQoUPZsGHDUcvS09MpKipix44d9OrVi6ioKJKTk3nrrbeac7ySJEmSJOkE9J2egfHhhx+ycuVKRo4cSVFREWvWrCE9PZ17770XgMOHD9etGxMTQ1lZGfv27SM6OrpJywCqq6uPqp2Xl0dWVhZZWVmUlJQc32glSZIkSdIJ6VsDjIqKCrKzs1m4cCGRkZF07twZgDFjxrB169ZvGgn/ZzPl5eXExsbSsWNHKioqmrQMICIi4qj6mZmZ5ObmkpubS7du3Y5zuJIkSZIk6UTUaIBRVVXFNddcw7333svZZ59NZWUlX3/9NQDr1q2jZ8+ewDe3muzcuZP9+/dTVlZGly5dSEpKIj8/H4D8/HwuueSSo5atWLGC5ORkEhISeO+996isrKS4uJjevXs353glSZIkSdIJqFVjv3z22WfZuHEjOTk55OTkMGnSJGbPnk379u1p3bo1CxYsAGDmzJlMmDCB6upqpk+fDsANN9zAuHHjWLBgASNGjCA+Pp74+Hi6du1KSkoK3bt3Z8qUKURGRjJ58mTS0tJo06YNixYtav5RS5IkSZKkE0pYTU1NTbA78V1lZWWRm5sb7G58Lz1+syxotXfNGh602pIkSZIkNUVD7/2/00M8JUmSJEmSgskAQ5IkSZIkhTwDDEmSJEmSFPIMMCRJkiRJUsgzwJAkSZIkSSHPAEOSJEmSJIU8AwxJkiRJkhTyDDAkSZIkSVLIM8CQJEmSJEkhzwBDkiRJkiSFPAMMSZIkSZIU8gwwJEmSJElSyDPAkCRJkiRJIc8AQ5IkSZIkhTwDDEmSJEmSFPIMMCRJkiRJUsgzwJAkSZIkSSGv0QBj06ZN9O/fn4EDB3Lttddy6NAh8vLySEpKYsiQIezevRuA7du3M3DgQJKSknj99dcBOHDgAKNHj2bAgAHMnj27rs2pU6eSkpJCdnY2hw4dAjhmm5IkSZIkSbUaDTC6detGQUEBa9eupUePHrzyyivMmTOHwsJCZsyYQU5ODgB33XUX8+fPZ/ny5UybNg2AefPmkZGRwfr16ykoKKC0tJStW7dSWlrKunXrSExMZMmSJVRVVR2zTUmSJEmSpFqNBhhxcXG0bdsWgKioKN5//3169epFVFQUycnJvPXWWwDs2bOHhIQEoqOjiY2NZe/evRQXFzNs2DAAhg4dyoYNG45alp6eTlFRETt27Dhmm5IkSZIkSbVafZeVPvzwQ1auXMmsWbP4/PPP65ZXV1cDcPjw4bplMTExlJWVsW/fPqKjo//Psri4uAbXO7LNWnl5eeTl5QFQUlLyfcYoSZIkSZJOcN8aYFRUVJCdnc3ChQuprq6moqKi7ncREREAhIf/cyJHeXk5sbGxdOzYkYqKCjp27Eh5eTlnnXUWVVVVdX9ff736bdbKzMwkMzMTgKysrOMYqiRJkiRJOlE1egtJVVUV11xzDffeey9nn302CQkJvPfee1RWVlJcXEzv3r2Bb2412blzJ/v376esrIwuXbqQlJREfn4+APn5+VxyySVHLVuxYgXJyckNtilJkiRJklSr0RkYzz77LBs3biQnJ4ecnBwmTZrE5MmTSUtLo02bNixatAiAmTNnMmHCBKqrq5k+fToAN9xwA+PGjWPBggWMGDGC+Ph44uPj6dq1KykpKXTv3p0pU6YQGRl5zDYlSZIkSZJqhdXU1NQEuxPfVVZWFrm5ucHuxvfS4zfLglZ716zhQastSZIkSVJTNPTev9FbSCRJkiRJkkKBAYYkSZIkSQp5BhiSJEmSJCnkGWBIkiRJkqSQZ4AhSZIkSZJCngGGJEmSJEkKeQYYkiRJkiQp5BlgSJIkSZKkkGeAIUmSJEmSQp4BhiRJkiRJCnkGGJIkSZIkKeQZYEiSJEmSpJBngCFJkiRJkkKeAYYkSZIkSQp5BhiSJEmSJCnkGWBIkiRJkqSQZ4AhSZIkSZJCngGGJEmSJEkKeY0GGOXl5Vx00UV06NCBbdu2AZCQkEBaWhppaWmsWrUKgO3btzNw4ECSkpJ4/fXXAThw4ACjR49mwIABzJ49u67NqVOnkpKSQnZ2NocOHQIgLy+PpKQkhgwZwu7du5tloJIkSZIk6cTVaIDRrl07li1bxpgxY+qWxcTEUFhYSGFhIUOHDgXgrrvuYv78+Sxfvpxp06YBMG/ePDIyMli/fj0FBQWUlpaydetWSktLWbduHYmJiSxZsoSqqirmzJlDYWEhM2bMICcnpxmHK0mSJEmSTkSNBhiRkZGceuqpRy378ssvSU1N5brrrqOsrAyAPXv2kJCQQHR0NLGxsezdu5fi4mKGDRsGwNChQ9mwYcNRy9LT0ykqKmLHjh306tWLqKgokpOTeeutt5pjnJIkSZIk6QTW5GdgFBUVsWbNGtLT07n33nsBOHz4cN3vY2JiKCsrY9++fURHRzdpGUB1dfVR9fLy8sjKyiIrK4uSkpKmj1CSJEmSJJ3wWjX1Dzp37gzAmDFjmDdvHgDh4f/MQcrLy4mNjaVjx45UVFTQsWNHysvLOeuss6iqqqKiouKY69WKiIg4ql5mZiaZmZkAZGVlNbW7Anr8ZlnQau+aNTxotSVJkiRJJ48mzcCorKzk66+/BmDdunX07NkTgLi4OHbu3Mn+/fspKyujS5cuJCUlkZ+fD0B+fj6XXHLJUctWrFhBcnIyCQkJvPfee1RWVlJcXEzv3r0DOT5JkiRJknQS+NYZGBkZGWzZsoX333+fK664gtzcXNq3b0/r1q1ZsGABADNnzmTChAlUV1czffp0AG644QbGjRvHggULGDFiBPHx8cTHx9O1a1dSUlLo3r07U6ZMITIyksmTJ5OWlkabNm1YtGhR845YkiRJkiSdcMJqampqgt2J7yorK4vc3Nxgd+N7CeZtHMHkLSSSJEmSpKZo6L1/kx/iKUmSJEmS1NIMMCRJkiRJUshr8reQSE3hN6BIkiRJkgLBGRiSJEmSJCnkGWBIkiRJkqSQZ4AhSZIkSZJCngGGJEmSJEkKeQYYkiRJkiQp5BlgSJIkSZKkkGeAIUmSJEmSQp4BhiRJkiRJCnkGGJIkSZIkKeQZYEiSJEmSpJBngCFJkiRJkkJeq2B3QGouPX6zLGi1d80aHrTakiRJknQycgaGJEmSJEkKeQYYkiRJkiQp5DUaYJSXl3PRRRfRoUMHtm3bBkBeXh5JSUkMGTKE3bt3A7B9+3YGDhxIUlISr7/+OgAHDhxg9OjRDBgwgNmzZ9e1OXXqVFJSUsjOzubQoUMNtilJkiRJklSr0QCjXbt2LFu2jDFjxgBQVVXFnDlzKCwsZMaMGeTk5ABw1113MX/+fJYvX860adMAmDdvHhkZGaxfv56CggJKS0vZunUrpaWlrFu3jsTERJYsWdJgm5IkSZIkSbUaDTAiIyM59dRT637esWMHvXr1IioqiuTkZN566y0A9uzZQ0JCAtHR0cTGxrJ3716Ki4sZNmwYAEOHDmXDhg1HLUtPT6eoqKjBNiVJkiRJkmo16VtI9u3bR3R0dN3P1dXVABw+fLhuWUxMDGVlZUete+SyuLi4Btc7ss1aeXl55OXlAVBSUtKU7kqSJEmSpJNEkwKMjh07UlFRUfdzREQEAOHh/5zIUV5eTmxsbN26HTt2pLy8nLPOOouqqqq6v6+/Xv02a2VmZpKZmQlAVlZWE4cnSZIkSZJOBk0KMBISEnjvvfeorKxk8+bN9O7dG4C4uDh27tzJaaedRllZGV26dCEpKYn8/Hyuv/568vPzeeKJJ9i7dy9z5sxh/PjxrFixguTk5AbblE5kPX6zLGi1d80aHrTakiRJktRcvjXAyMjIYMuWLbz//vvcfPPNTJ48mbS0NNq0acOiRYsAmDlzJhMmTKC6uprp06cDcMMNNzBu3DgWLFjAiBEjiI+PJz4+nq5du5KSkkL37t2ZMmUKkZGRx2xTkiRJkiSpVlhNTU1NsDvxXWVlZZGbmxvsbnwvwfxEXj8szsCQJEmSdCJr6L1/o99CIkmSJEmSFAqa9AwMSaHP529IkiRJOhk5A0OSJEmSJIU8AwxJkiRJkhTyvIVEUsB4+4okSZKk5mKAIemkYHgiSZIkndwMMCTpOBmeSJIkSc3PZ2BIkiRJkqSQZ4AhSZIkSZJCnreQSNIJzNtXJEmS9EPhDAxJkiRJkhTyDDAkSZIkSVLIM8CQJEmSJEkhz2dgSJK+l2A+fyOYfPaHJElScDgDQ5IkSZIkhTwDDEmSJEmSFPIMMCRJkiRJUsjzGRiSJDVBMJ/94fM3JEnSD1mTA4xdu3bRr18/zjnnHADy8vIoLCzkkUceoW3btixatIj4+Hi2b9/OTTfdRFVVFTk5OQwZMoQDBw6QnZ3NZ599xqhRo7jjjjsAmDp1KsXFxfTo0YMFCxYQGRkZ2FFKknQS8MGpkiTph+x73UKSmppKYWEhhYWFdOrUiTlz5lBYWMiMGTPIyckB4K677mL+/PksX76cadOmATBv3jwyMjJYv349BQUFlJaWsnXrVkpLS1m3bh2JiYksWbIkcKOTJEmSJEknhe91C0lRUREpKSmkpKSQnZ1Nr169iIqKIjk5mSlTpgCwZ88eEhISAIiNjWXv3r0UFxfz0EMPATB06FA2bNjA559/zrBhwwBIT0/nySef5Nprrw3E2CRJ0knA23YkSRJ8jwAjLi6ODz74gHbt2nHjjTfy4osvEh0dXff76upqAA4fPly3LCYmhrKyMvbt21e37pHL4uLijlp2pLy8PPLy8gAoKSlpanclSZIkSdJJoMm3kLRu3Zr27dsTFhbG6NGj2bp1KxUVFXW/j4iI+Kbh8H82XV5eTmxsLB07dqxbt7FlR8rMzCQ3N5fc3Fy6devW9BFKkiRJkqQTXpNnYOzfv59TTjkFgHXr1jF8+HAef/xxKisr2bx5M7179wa+mamxc+dOTjvtNMrKyujSpQtJSUnk5+dz/fXXk5+fzxNPPMHevXuZM2cO48ePZ8WKFSQnJwd2hJIkSd/TD/XBqT9U3jIkSaGtyQHG+vXrufvuu2nXrh3/8i//Qk5ODm3atCEtLY02bdqwaNEiAGbOnMmECROorq5m+vTpANxwww2MGzeOBQsWMGLECOLj44mPj6dr166kpKTQvXv3umdoSJIkSZIk1QqrqampCXYnvqusrCxyc3OD3Y3vxU9wJEmSpNDhjBspdDX03v97fQuJJEmSJJ3I/IYj6cRjgCFJkiRJLeiHOjvb4EbHq8nfQiJJkiRJktTSDDAkSZIkSVLI8xYSSZIkSVKz+6HeOhNMJ9ttO87AkCRJkiRJIc8AQ5IkSZIkhTwDDEmSJEmSFPIMMCRJkiRJUsgzwJAkSZIkSSHPAEOSJEmSJIU8AwxJkiRJkhTyDDAkSZIkSVLIM8CQJEmSJEkhzwBDkiRJkiSFPAMMSZIkSZIU8gwwJEmSJElSyAuZAGPq1KmkpKSQnZ3NoUOHgt0dSZIkSZIUQkIiwNi6dSulpaWsW7eOxMRElixZEuwuSZIkSZKkEBISAUZxcTHDhg0DID09naKioiD3SJIkSZIkhZJWwe4AwL59+4iLiwMgJiaGsrKyut/l5eWRl5cHwObNm8nKygpKH4/XRUGsXVJSQrdu3axtbWtb29rWtra1rW1ta1vb2j+g2v373x+02sdj586dx/5FTQh47LHHahYtWlRTU1NTs3nz5pqf/exnQe7RySUzM9Pa1ra2ta1tbWtb29rWtra1rW3tE1pI3EKSlJREfn4+ACtWrCA5OTnIPZIkSZIkSaEk4r777rsv2J04/fTTKS4uJicnh8rKSu68804iIiKC3a2TyjnnnGNta1vb2ta2trWtbW1rW9va1rb2CSuspqamJtidkCRJkiRJakxI3EIinYz27t3LI488EuxuSGoGH3/8cd2tj5IkSWoZBhhSgPz7v//7UT/PmTOHDh068OKLLwapR82r/nhPZj+ksYa6lt4XW7duZdKkSdx7770cOHCgbvlDDz1EYmJis9YO9nEXjPoNbe+WEuxt/kMW7H3f0jzWgiMY2z3Y+zrY9UPh3A72NmhpJ/t4DTCk41RUVMSgQYP429/+RmpqKi+88AIAqamp/OpXv+Lf/u3fgtzDwGpovLXGjBnDrl27Wqw/GzduJDk5mQsuuICnn346oG1/21jVcoK1L8477zy2bt1KREQE7du3Z+HChXTp0oX777+f+Ph4pkyZQmFhYUBrBvu4O1b9tLQ0vvzyy2avXX97A9TU1DBr1iwGDBjAoEGDePXVVwNeN9jbPBS11D6vdax9fzLyWAuO77rd77vvPpYuXdqiNev74osvyM3NDVr9QAvmuR0q26Cl/FDG2yrYHZBOZH//+9+55ZZbWL58OXFxcRw6dIjNmzcDkJeXR3Z2Nq+99hpXXXVVkHsaGI2NN1ji4uIoKCigpqaGlJQUxo0bF5B2Q3GsP1TB3BclJSXEx8dTWFjItGnTgG8ePD1//nxuvfXWgNcL9nEX7PrH2t6PPvoo1dXVtGvXjpqaGlavXk2XLl1ISkoKSM1gj1nfONa+D7bDhw8THh64z/o81oIjGNv9eGrWBhhZWVlBqR9o9c/tP/3pT/z5z3/mlFNO4Re/+EWz1Q2lbdASfkjjdQaGTjrV1dWMGzeO1NRUhg8fzr59+5qt1muvvcaVV15JXFwcAJGRkfTv35+qqip27drFPffc02K3kHz66acMGjSIlJQUxowZQ3V1dcBrNDTe/Px8LrjgAkaPHk1paSkA//jHPxg3bhyDBw9m1KhRVFRUBLw/AN27d6d169Zs2rSJs88+O2Dtfpex9u/fn127drFw4UJ+//vfA7B06VJqv9xp4cKFpKSkkJSUREFBQcD6dqSbb765Wdo9lvrn1ptvvklSUhKDBg1q1n40tC8eeugh0tLSuOCCC1i1alWz1F6yZAljx44lMTGR7du3AzBhwgSeeuopqqqqAl6vobEe+Wl47SynhQsXctVVVzFy5Ej69evHxx9/3Gz1a7399tukpqbSv39/fv7znx93vfrqb+/a54y/9957LFq0iNdee42HHnqI//3f/w1YzYbGvHnz5rpr6sMPPwzA448/zkUXXcTgwYN56aWXAtaHWvWv4zt37myRcywUHOtca4kxH2ubDxw4kKuvvpoHH3wwoLWacqzdd999ZGdnk5GRQWpqKgcPHgxoX+qbPHlys9eor6amhltvvZVBgwZx6aWXsnv37map05TtHoya9a8rc+fOZc2aNaSlpfHuu+8GtH5aWhq/+tWvGDhwYN01vKKiglGjRpGamso111xDZWVlALbAP9U/t/v3709ZWRkdO3YMaJ36jrUNevbsyfDhw+vWGTJkSLO9Pq1/fH/00UekpaUxaNAgLr/88oDX+7Z9fskll3Dfffdx66230rdvX377298GvA8txRkYOum89NJLxMfH8/TTT/PUU0/xu9/9rtk+zdmzZ0/dhaKgoIAZM2YQHR3Nz3/+cy699FLi4uL48ssvOXjwIG3btm2WPtTq1KkTq1atolWrVtx2220UFBQwdOjQgNZoaLyfffYZ+fn5tG/fnh//+McAzJs3j8GDB3P99dfz/PPP89///d9MmTIloP2p9fnnn/PrX/86YNM+oWljPZa///3vPPfcc6xdu5avvvqK4cOHM3jw4ID1r1bfvn0D3mZD6p9bS5cuZdy4cdxyyy0cPny42eo2tC+ee+45fv3rX/PZZ5+RmZkZ8OMdYOXKlbz88svExsaSl5dHt27daNOmDaNGjeL5558PeL2GxtqQmJgYFixYwNy5c8nLyzvuT7O+rX7Pnj0pLCwkLCyMyy+/nB07dpCQkHBcNY9Uf3tPnDiRnj17smXLlrp+wTfXu0BpaMxfffUVL774Ip06dWLkyJFkZ2eTm5tLfn4+0dHRzXLM17+O5+fnt8g5FjAZTx4AAAgHSURBVArq7/t77rmH//qv/2r2usf6v7O0tJT8/HyioqICWqspxxpAQkICTz31FFOnTmXVqlWMGjUqoP05UjDezCxbtoxOnTqxevVqNm7cyKxZs+o+DAikpm73lq5Z/7rSp08fdu7cyZIlSwJeH+CKK65gzpw59O/fn/Lycp544gkyMjKYOHEiOTk5PPfcc4wfP/74N8L/V//cTkxM5IwzzmjWDxih4W0QFRXFxx9/zMGDBznttNMa/T/2eNQ/vi+77DJGjhzJ7Nmzm+V63tg+v+qqq3j44Yfp3r07S5cu5ZFHHuHiiy9m8uTJAe9HSzDA0Enngw8+oF+/fgD069ePlStXNlutM844gx07dgAwePBgBg8eTN++fVmyZAnvv/8+hYWFlJaW8qc//YnRo0c3Wz/gmzfMkyZNYt++fezZs4cLLrgg4DUaGm9YWBixsbEA9O7dG4B3332XN954gz/+8Y8cOnSIlJSUgPen1urVqxk7diynnnpqwNpsyljDwsLq/q72E+OdO3fyzjvvMGjQIOCbkCXQampq+OMf/8iNN94Y8LaPpf659eKLL9KzZ0/Gjh3LZZddFtAXPEdqaF889dRTLF68mPDw8IDMPqhv9+7dbNu2jcsvv5yamhrKy8uZOHEiAD/72c8YMWLEUbMTAqGhsZ5yyil16xz57ed9+vQBoFu3brz55pvNVr9Dhw4A/O1vf+P222/nq6++4q9//St79uwJWIBxrO199913s2PHDqqrq/nkk0/o0qULQEBf+DY05o8++ogrr7yyrl5JSQmzZs3itttuo6amhjvvvDOgs77g2Nfxbdu2Nfs5diyBfrZLY4617++55x769u3bIlP9j9zmsbGxnHfeeQEPL6BpxxocfX4395u9tLQ0li5dWneut4R3332Xl156ibVr11JTU0O3bt2apU5Tt3tL16x/XWndunWz1e/QoUPdcXXmmWfyxRdf8MEHH9S9jujXrx9FRUXHXb9WQ+d2S2hoG9x55508++yzHDhwgLFjxzZb/frH94UXXkj79u0ZO3Ysffr0CfiHeo3t8969exMeHs7pp5/OeeedR1hYGJGRkQGt35K8hUTNqrmmAzamZ8+ebNq0CYA33ngjoJ8O1peRkcFLL73Enj17AOqmlP/lL39hzZo1LF++nPz8/ONK0b+rZ555hhEjRrBmzRrS09OPepMTKA2NNyIign379vH111/z9ttvA5CYmMgvfvELCgsLKSoqIicnJ+D9qZWQkEBycnJA22zKWDt16lR3rG/duhWAf/3Xf6V3796sXr2awsJCtmzZEtD+wTffhNEcszoaUv/cOv/883nooYdYvHgxDz74YLN9QtzQvvjd737H6tWref7555vleF+yZAmPPPIIy5cvZ8WKFVxwwQV8/fXXwDf7/OKLL2bFihUBrdnQWGuPsaqqKt5555269Y8VnjVH/Vpz587l9ttvZ82aNfTp0yeg2/1Y2/svf/kLAD/+8Y8ZP348GRkZ3H777Vx88cUBq9vQmM877zxeeeUVCgsL+fOf/8yFF17Iueeey5NPPslNN90U8NsL4P9ex7/66qsWOceC7Vj7/v3332+R2vW3+VlnnRXQ514cqSnHGgT+/A41iYmJZGVlUVhYyJo1a3jyySebpU5Tt3tL16x/XYmMjDzu24Abu5bXP66a83VzMM/thrbByJEjWbZsGatWrSI9Pb3Z6tc/vufOncu9997L4sWLWblyJR999FFA633XfX7kv09UzsD4Afjkk0+YO3cu06dPb9G6VVVVXHvttaxbt65F615xxRW8+OKLDBw4kA4dOgT8mymO1LlzZx5//HGuu+46wsLCCA8P57bbbjvqE6OuXbvy17/+tdlvIxkyZAjZ2dn8z//8T7PVOdZ4J0+ezGmnncaQIUPo0aMH3bt3B+Cmm27ipptuqntBcvvttx9132Egffrppxw8eDCgLz6aMtZLL72Uhx9+mIyMDM4880zOPPNMunTpwjXXXENqaioRERGce+65/Od//mfA+vfuu++Sn5/P8uXLA9bmt6l/bv3kJz+pm1lz2WWXNduL/ob2xfr16xkwYACXXHJJs3xq+MILL/Dyyy/X/Txo0CC2b99ed/vCL3/5y4BPd25orKeffjqZmZn07t2brl27BrTmd6k/b9484JsXfrfddhuJiYkBfzN9rO2dm5vL3XffzYMPPshXX31FZGQkQ4YMCWhg2dCYe/XqxejRozl8+DCtW7fmpZdeYtKkSezatYuvv/6amTNnBqwPtepfxysqKlrkHDuWyZMn88ADDzT77Y/Q8L5vCS3xf2etphxrPwQjR46koKCAQYMGERYWxtixY/npT38a8DrB2O7Hc12Ji4vj4MGDjBkzhgceeOB7BQrfdi0/0o033sjYsWN57rnn6Nq1K1OnTg3EJgAaPrdbYhZGQ9sgKiqKxMREwsPDadWq+d4K1z++U1NTyc/PJzw8nPj4eOLj4wNaryn7/EQXVnMyRroKCZs2bWLr1q0tNr1damljxozh4YcfpkePHi1e+4UXXuCdd95h2rRpfPjhh5x11lkt3gdJak4tcQuJvhGMW0ikYLn11lv5yU9+0qLPEVPgGGBI0vcUzADjyy+/5MorryQsLIzu3buflAm7JElSIN1yyy2Ul5ezePHiYHdF35MBhiRJkiRJCnk+xFOSJEmSJIU8AwxJkiRJkhTyDDAkSZIkSVLIM8CQJEmSJEkhzwBDkiRJkiSFvP8H9zZh4Get+0kAAAAASUVORK5CYII=
" />
</div>
</div>
<div class="output_area">
<div class="output_png output_subarea ">
<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAABCsAAADQCAYAAAAu0euYAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAAJOgAACToB8GSSSgAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+j8jraAAAgAElEQVR4nO3de3xU5Z3H8e/kJqDNrbAIGy7VpQ2iVC5BmMkwgZgQQqCAzVAXA5TFFott2RUJQipgpOXS0rK74OoCFRewZngpInSBBgwkJAjEgmEJclljQwAlJiRpFJKZnP2DzTSREFAzl+Dn/dfkmTPze57JkzlnvjnnGZNhGIYAAAAAAAD8RICvOwAAAAAAANAUYQUAAAAAAPArhBUAAAAAAMCvEFYAAAAAAAC/EuTrDtzIoEGDdO+99/q6GwAAAAAAwEPOnj2rwsLC69r9Nqy49957lZWV5etuAAAAAAAAD7Hb7S22cxkIAAAAAADwK4QVAAAAAADArxBWAAAAAAAAv0JYAQAAAAAA/AphBQAAAAAA8CuEFQAAAAAAwK8QVgAAAAAAAL8S5OsO3I56z9vhs9olS8f4rDYAAAAAAG2BMysAAAAAAIBfaTWsaGho0LRp02S1WhUbG6uTJ08qLy9PZrNZsbGxKioqkiRdvHhRiYmJslgs2rhxoyTJ5XJp+vTpslqtmj17tvs5V61aJYvFonHjxqm6utqDQwMAAAAAAO1Rq2HF0aNHdfXqVeXm5upXv/qVVq5cqQULFmjHjh3avHmz0tPTJUnLli3T3LlztW/fPq1evVpXrlzR9u3b1b17d+Xm5qq2tlYFBQUqLy/Xtm3blJeXp0mTJmn16tVeGSQAAAAAAGg/Wl2zIioqSoZhyDAMVVZW6s4771RgYKAiIiIUERGhiooKSdKhQ4f0m9/8RgEBARo8eLCOHz+u/Px8jRlzbf2EpKQkHThwQJcvX5bNZpPJZFJSUpKmTp3arJ7D4ZDD4ZAklZaWemK8tz3WywAAAAAAtHethhWdO3dWcHCwoqOjdeXKFeXm5upnP/vZ3x4cFKS6ujrV19crIODaSRphYWGqqKhQZWWlQkNDb9rWVGpqqlJTUyVJdru97UYJAAAAAADajVbDit27dysoKEjvv/++jhw5oqeeeqrZOhNOp1MhISEKDg5WQ0ODAgICVFVVpcjISIWHh7u3bdp25syZZm0AAAAAAABNtbpmhWEY+uY3vynp2lkWNTU1cjqdunz5skpLS91hQ0xMjHJycuR0OlVYWKh+/frJbDYrOztbkrRr1y5ZLBbFxMRo//79zdoAAAAAAACaavXMioSEBL388suy2Wy6evWqVq5cKafTqeTkZJlMJq1Zs0aSlJ6erilTpigjI0MzZ85Ux44dlZKSoq1bt8pqtWrAgAEaNmyYJGnMmDGyWCyKiIjQpk2bPD9CAAAAAADQrpgMwzB83YmW2O12ZWVl+bobX4ovF7n0JRbYBAAAAAB8ETf67N/qZSAAAAAAAADeRlgBAAAAAAD8CmEFAAAAAADwK4QVAAAAAADArxBWAAAAAAAAv0JYAQAAAAAA/AphBQAAAAAA8CuEFQAAAAAAwK8QVgAAAAAAAL9CWAEAAAAAAPwKYQUAAAAAAPArhBUAAAAAAMCvEFYAAAAAAAC/0mpYUVBQoLi4OMXFxenb3/62/vmf/1l5eXkym82KjY1VUVGRJOnixYtKTEyUxWLRxo0bJUkul0vTp0+X1WrV7Nmz3c+5atUqWSwWjRs3TtXV1R4cGgAAAAAAaI9aDSuGDRumnJwc5eTkyGw2a/z48VqwYIF27NihzZs3Kz09XZK0bNkyzZ07V/v27dPq1at15coVbd++Xd27d1dubq5qa2tVUFCg8vJybdu2TXl5eZo0aZJWr17tlUECAAAAAID245YuA6mrq9OhQ4c0ePBgBQYGKiIiQj179lRFRYUk6dChQxo5cqSCgoI0ePBgHT9+XPn5+UpMTJQkJSUl6cCBAzp8+LBsNptMJpO7DQAAAAAAoKmgW9koOztb8fHxqqqqUmho6N8eHBSkuro61dfXKyDgWu4RFhamiooKVVZWurdtra0ph8Mhh8MhSSotLf3qowMAAAAAAO3OLZ1Z4XA4lJqaqvDw8GbrTDidToWEhCg4OFgNDQ2SpKqqKkVGRjbbtrW2plJTU5WVlaWsrCz16NGjTQYIAAAAAADal5uGFfX19Tp8+LBiY2PVqVMnOZ1OXb58WaWlpe6wISYmRjk5OXI6nSosLFS/fv1kNpuVnZ0tSdq1a5csFotiYmK0f//+Zm0AAAAAAABN3fQykOzsbI0cOdJ9mcfzzz+v5ORkmUwmrVmzRpKUnp6uKVOmKCMjQzNnzlTHjh2VkpKirVu3ymq1asCAARo2bJgkacyYMbJYLIqIiNCmTZs8ODQAAAAAANAemQzDMHzdiZbY7XZlZWX5uhtfSu95O3zdBZ8oWTrG110AAAAAALQjN/rsf0trVgAAAAAAAHgLYQUAAAAAAPArhBUAAAAAAMCvEFYAAAAAAAC/QlgBAAAAAAD8CmEFAAAAAADwK4QVAAAAAADArxBWAAAAAAAAv0JYAQAAAAAA/AphBQAAAAAA8CuEFQAAAAAAwK8QVgAAAAAAAL9CWAEAAAAAAPzKTcOKnJwcxcfHa8SIEXrjjTeUl5cns9ms2NhYFRUVSZIuXryoxMREWSwWbdy4UZLkcrk0ffp0Wa1WzZ492/18q1atksVi0bhx41RdXe2hYQEAAAAAgPaq1bDis88+029+8xv993//t95++21NmDBBCxYs0I4dO7R582alp6dLkpYtW6a5c+dq3759Wr16ta5cuaLt27ere/fuys3NVW1trQoKClReXq5t27YpLy9PkyZN0urVq70ySAAAAAAA0H60GlYUFBSoY8eOGjt2rCZMmKALFy4oMDBQERER6tmzpyoqKiRJhw4d0siRIxUUFKTBgwfr+PHjys/PV2JioiQpKSlJBw4c0OHDh2Wz2WQymdxtAAAAAAAATQW1dudHH32kM2fO6ODBg8rOztbChQsVGhr6twcHBamurk719fUKCLiWe4SFhamiokKVlZXubVtra8rhcMjhcEiSSktL226UAAAAAACg3Wj1zIrw8HBZLBaFhIQoPj5ef/7zn5utM+F0OhUSEqLg4GA1NDRIkqqqqhQZGanw8HD3tq21NZWamqqsrCxlZWWpR48ebTpQAAAAAADQPrQaVsTExKi4uFiGYejo0aO677775HQ6dfnyZZWWlrrDhpiYGOXk5MjpdKqwsFD9+vWT2WxWdna2JGnXrl2yWCyKiYnR/v37m7UBAAAAAAA01eplIJ07d9aECRPc60ysX79eZWVlSk5Olslk0po1ayRJ6enpmjJlijIyMjRz5kx17NhRKSkp2rp1q6xWqwYMGKBhw4ZJksaMGSOLxaKIiAht2rTJ8yMEAAAAAADtiskwDMPXnWiJ3W5XVlaWr7vxpfSet8PXXfCJkqVjfN0FAAAAAEA7cqPP/q1eBgIAAAAAAOBthBUAAAAAAMCvEFYAAAAAAAC/QlgBAAAAAAD8SqvfBgJ8ESwsCgAAAABoC5xZAQAAAAAA/AphBQAAAAAA8CuEFQAAAAAAwK8QVgAAAAAAAL9CWAEAAAAAAPwKYQUAAAAAAPArhBUAAAAAAMCvEFYAAAAAAAC/0mpYUVJSoi5duiguLk5xcXG6dOmSHA6HzGaz4uPjde7cOUnSyZMnNXz4cJnNZu3Zs0eSVFtbq4kTJyo2NlbLly93P2d6erqsVqvS0tJUX1/vwaEBAAAAAID26KZnVthsNuXk5CgnJ0cRERFauXKlcnJy9NxzzykzM1OSNH/+fK1bt047d+7Us88+K0lau3atkpOTlZeXp71796qsrEzHjh1TWVmZcnNzFR0drS1btnh2dAAAAAAAoN25aVhx4MABWa1WzZ8/X6dPn1bfvn0VEhIii8Wi9957T5J0/vx59enTR6GhoYqMjFR5ebny8/OVmJgoSUpISFBBQUGztqSkJB04cMCDQwMAAAAAAO1RUGt3duvWTWfOnFGnTp30+OOP6/XXX1doaKj7fpfLJUlqaGhwt4WFhamiokKVlZXubZu2devWrVlbUw6HQw6HQ5JUWlraBsMDAAAAAADtTatnVtxxxx268847ZTKZNHHiRB07dkzV1dXu+wMDA689ScDfnqaqqkqRkZEKDw93b9taW1OpqanKyspSVlaWevTo0TYjBAAAAAAA7UqrYUVNTY37dm5ursaMGaPi4mLV1dUpPz9f/fv3l3TtDIyzZ8+qpqZGFRUV6ty5s8xms7KzsyVJ2dnZGjp0aLO2Xbt2yWKxeGpcAAAAAACgnWr1MpC8vDxlZGSoU6dO+ta3vqXMzEx16NBBcXFx6tChgzZs2CBJWrJkiaZNmyaXy6XFixdLkmbMmKHHHntM69evV0pKiqKiohQVFaWuXbvKarWqZ8+emjNnjudHCAAAAAAA2hWTYRiGrzvRErvdrqysLF9340vpPW+Hr7sALypZOsbXXQAAAACAdulGn/1v+m0gAAAAAAAA3kRYAQAAAAAA/AphBQAAAAAA8CuEFQAAAAAAwK8QVgAAAAAAAL9CWAEAAAAAAPwKYQUAAAAAAPArhBUAAAAAAMCvEFYAAAAAAAC/QlgBAAAAAAD8CmEFAAAAAADwK4QVAAAAAADArxBWAAAAAAAAv3JLYcWrr76qLl26SJIcDofMZrPi4+N17tw5SdLJkyc1fPhwmc1m7dmzR5JUW1uriRMnKjY2VsuXL3c/V3p6uqxWq9LS0lRfX9/W4wEAAAAAAO1c0M02cLlccjgc6tGjh5xOp1auXKl9+/bp8OHDyszM1Isvvqj58+dr3bp16tq1q0aPHq34+HitXbtWycnJmjFjhpKSkjR58mSVl5errKxMubm5WrJkibZs2aJHH33UG+MEPKb3vB0+q12ydIzPagMAAACAp9z0zIpXX31VqampCggI0OnTp9W3b1+FhITIYrHovffekySdP39effr0UWhoqCIjI1VeXq78/HwlJiZKkhISElRQUNCsLSkpSQcOHPDg0AAAAAAAQHvUaljhcrmUlZWlSZMmSZIqKysVGhra7H5JamhocLeFhYWpoqKi2battTXlcDhkt9tlt9tVWlraBsMDAAAAAADtTauXgWzcuFF2u10BAdcyjfDwcFVXV7vvDwwMlCT3/ZJUVVWlyMhI97bh4eGqqqpSr1695HQ63Y9v3K6p1NRUpaamSpLsdnsbDA8AAAAAALQ3rZ5ZceLECb3yyitKSkrS6dOn9W//9m8qLi5WXV2d8vPz1b9/f0lSt27ddPbsWdXU1KiiokKdO3eW2WxWdna2JCk7O1tDhw5t1rZr1y5ZLBYPDw8AAAAAALQ3rZ5ZsWzZMvftwYMH64UXXtBrr72muLg4dejQQRs2bJAkLVmyRNOmTZPL5dLixYslSTNmzNBjjz2m9evXKyUlRVFRUYqKilLXrl1ltVrVs2dPzZkzx4NDAwAAAAAA7ZHJMAzD151oid1uV1ZWlq+78aX48tsh8PXCt4EAAAAAaM9u9Nn/pt8GAgAAAAAA4E2EFQAAAAAAwK+0umYFAP/my0uOuAQFAAAAgKdwZgUAAAAAAPArhBUAAAAAAMCvEFYAAAAAAAC/QlgBAAAAAAD8CgtsAvhSWNwTAAAAgKdwZgUAAAAAAPArhBUAAAAAAMCvEFYAAAAAAAC/wpoVANod1ssAAAAAbm+EFQDwBRCUAAAAAJ7X6mUgH330kcxms2w2m0aOHKkLFy4oLy9PZrNZsbGxKioqkiRdvHhRiYmJslgs2rhxoyTJ5XJp+vTpslqtmj17tvs5V61aJYvFonHjxqm6utqDQwMAAAAAAO1Rq2FF586dlZeXp3379mnKlClat26dFixYoB07dmjz5s1KT0+XJC1btkxz587Vvn37tHr1al25ckXbt29X9+7dlZubq9raWhUUFKi8vFzbtm1TXl6eJk2apNWrV3tlkAAAAAAAoP1oNawIDAxUQMC1TWpqanTvvfcqMDBQERER6tmzpyoqKiRJhw4d0siRIxUUFKTBgwfr+PHjys/PV2JioiQpKSlJBw4c0OHDh2Wz2WQymdxtAAAAAAAATd10zYqjR4/qxz/+sS5fvqzdu3frtdde+9uDg4JUV1en+vp6d6gRFhamiooKVVZWKjQ09KZtTTkcDjkcDklSaWlp24wQAAAAAAC0KzcNKx588EG98847ysrK0pIlS5qtM+F0OhUSEqLg4GA1NDQoICBAVVVVioyMVHh4uHvbpm1nzpxp1tZUamqqUlNTJUl2u73NBgkAAAAAANqPVi8Dqaurc98OCwvTXXfdJafTqcuXL6u0tNQdNsTExCgnJ0dOp1OFhYXq16+fzGazsrOzJUm7du2SxWJRTEyM9u/f36wNAAAAAACgqVbPrDh69KjmzJmjwMBAdejQQevXr9fp06eVnJwsk8mkNWvWSJLS09M1ZcoUZWRkaObMmerYsaNSUlK0detWWa1WDRgwQMOGDZMkjRkzRhaLRREREdq0aZPnRwgAtwm+NhUAAABfFybDMAxfd6IldrtdWVlZvu7Gl+LLDxQA4AmEFQAAAPCEG332v+maFQAAcFYHAAAAvKnVNSsAAAAAAAC8jbACAAAAAAD4FcIKAAAAAADgVwgrAAAAAACAX2GBTQCAX2NxTwAAgK8fzqwAAAAAAAB+hbACAAAAAAD4FcIKAAAAAADgVwgrAAAAAACAXyGsAAAAAAAAfoWwAgAAAAAA+BXCCgAAAAAA4FdaDSsOHTqkYcOGafjw4Xr00UdVX18vh8Mhs9ms+Ph4nTt3TpJ08uRJDR8+XGazWXv27JEk1dbWauLEiYqNjdXy5cvdz5meni6r1aq0tDTV19d7cGgAAAAAAKA9ajWs6NGjh/bu3av9+/erd+/eevPNN7Vy5Url5OToueeeU2ZmpiRp/vz5WrdunXbu3Klnn31WkrR27VolJycrLy9Pe/fuVVlZmY4dO6aysjLl5uYqOjpaW7Zs8fwIAQAAAABAuxLU2p3dunVz3w4JCdH777+vvn37KiQkRBaLRXPmzJEknT9/Xn369JEkRUZGqry8XPn5+VqxYoUkKSEhQQUFBbp06ZISExMlSUlJSfr973+vRx991CMDAwDgq+o9b4fPapcsHeOz2gAAAL7WaljR6MMPP9Tu3bu1dOlSXbp0yd3ucrkkSQ0NDe62sLAwVVRUqLKyUqGhode1NQYgjW1NORwOORwOSVJpaelXGBYAAO0bQQkAAPg6u2lYUV1drbS0NL388styuVyqrq523xcYGChJCgj429UkVVVVioyMVHh4uKqrqxUeHq6qqir16tVLTqfT/fjG7ZpKTU1VamqqJMlut3/10QEAgC+MoAQAAPhaq2tWOJ1O/eAHP9DChQv1ne98R3369FFxcbHq6uqUn5+v/v37S7p2ucjZs2dVU1OjiooKde7cWWazWdnZ2ZKk7OxsDR06tFnbrl27ZLFYPDw8AAAAAADQ3rR6ZsWrr76qd955R5mZmcrMzNQTTzyh2bNnKy4uTh06dNCGDRskSUuWLNG0adPkcrm0ePFiSdKMGTP02GOPaf369UpJSVFUVJSioqLUtWtXWa1W9ezZ073mBQAAAAAAQCOTYRiGrzvRErvdrqysLF9340vx5emzAAC0Z1wGAgDA18uNPvvf0gKbAAAA3sB6GQAAQCKsAAAAkERQAgCAP2l1gU0AAAAAAABv48wKAAAAH+OsDgAAmuPMCgAAAAAA4FcIKwAAAAAAgF8hrAAAAAAAAH6FsAIAAAAAAPgVFtgEAAD4GmNxTwCAPyKsAAAAgE/4MijxJUIaALg5LgMBAAAAAAB+hbACAAAAAAD4FS4DAQAAALyIdUIA4OZaDSuqqqqUkJCgEydO6ODBg7r//vvlcDj029/+Vh07dtSGDRsUFRWlkydP6kc/+pGcTqcyMzMVHx+v2tpapaWl6eOPP9a4ceM0d+5cSVJ6erry8/PVu3dvrV+/XsHBwV4ZKAAAAPB1R1ACoL1oNazo1KmTduzYoaefflqS5HQ6tXLlSu3bt0+HDx9WZmamXnzxRc2fP1/r1q1T165dNXr0aMXHx2vt2rVKTk7WjBkzlJSUpMmTJ6u8vFxlZWXKzc3VkiVLtGXLFj366KNeGSgAAAAA3yEoAfBFtBpWBAcHq0uXLu6fT58+rb59+yokJEQWi0Vz5syRJJ0/f159+vSRJEVGRqq8vFz5+flasWKFJCkhIUEFBQW6dOmSEhMTJUlJSUn6/e9/T1gBAAAAwKMISoD25wutWVFZWanQ0FD3zy6XS5LU0NDgbgsLC1NFRUWzbZu2devWrVlbUw6HQw6HQ5JUWlr6JYYDAAAAAADauy8UVoSHh6u6utr9c2BgoCQpIOBvXypSVVWlyMhI97bh4eGqqqpSr1695HQ63Y9v3K6p1NRUpaamSpLsdvuXGxEAAAAA+AlfntUB7+NMmrbzhcKKPn36qLi4WHV1dTpy5Ij69+8vSerWrZvOnj2rv/u7v1NFRYU6d+4ss9ms7OxsTZ8+XdnZ2frP//xPlZeXa+XKlZoyZYp27doli8XikUEBAAAAAOBtXHLUdm4aViQnJ+vo0aN6//339eMf/1izZ89WXFycOnTooA0bNkiSlixZomnTpsnlcmnx4sWSpBkzZuixxx7T+vXrlZKSoqioKEVFRalr166yWq3q2bOne80LAAAAAACARjcNK/74xz9e1zZp0qRmP993333Kzc1t1nbXXXdp69at1z22cdFNAAAAAACAlgTcfBMAAAAAAADvIawAAAAAAAB+hbACAAAAAAD4FcIKAAAAAADgVwgrAAAAAACAXyGsAAAAAAAAfoWwAgAAAAAA+BXCCgAAAAAA4FcIKwAAAAAAgF8hrAAAAAAAAH6FsAIAAAAAAPgVwgoAAAAAAOBXCCsAAAAAAIBf8UlYkZ6eLqvVqrS0NNXX1/uiCwAAAAAAwE95Paw4duyYysrKlJubq+joaG3ZssXbXQAAAAAAAH7M62FFfn6+EhMTJUlJSUk6cOCAt7sAAAAAAAD8WJC3C1ZWVqpbt26SpLCwMFVUVLjvczgccjgckqQjR47Ibrd7u3ttYogPa5eWlqpHjx7Upja1qU1talOb2tSmNrWpTe2vUe1hw573We2v4uzZsy3fYXjZ6tWrjQ0bNhiGYRhHjhwxZs2a5e0u3NZSU1OpTW1qU5va1KY2talNbWpTm9rUbte8fhmI2WxWdna2JGnXrl2yWCze7gIAAAAAAPBjgYsWLVrkzYJ333238vPzlZmZqbq6Oj3zzDMKDAz0Zhdue/369aM2talNbWpTm9rUpja1qU1talO73TIZhmH4uhMAAAAAAACNvH4ZCAAAAAAAQGsIK24Tx48f17Rp03zdja+NnJwczZkzx9fdkCQNHjzY113Abcqf5rk3lJSUyGQy6dChQ5Kk7du3y8tXSsKLvm7zG75VUlKi3bt3e72uP8zz1o5Rp02bpuPHj3u1PxcvXtTChQu9WrOkpETf//73vVqz0datW/Xxxx83a3v++ee1b9++Nq9VUlKiLl26KC4uTnFxcXrmmWfavMYXre/t4+ScnBwtWrRIcXFxXq17K44cOaJ3333X1934QggrAAD4f/fdd5+WL1/u6274pYaGBl93Abe523mO+SqswPXuvvtuLV682Nfd8JrPhxWffvqpYmJiZLPZPFLPZrMpJydHOTk5+tWvfuWRGv5c3xuuXLmimTNnymazyWKxyOFw3PQxDQ0NWrZsmfr27euFHrYdwop2zOl0ym636+GHH9Zvf/tbSdLOnTtltVplNpv16quveqzeP/3TP2natGnN0srG2+Xl5Ro/frxGjhypyZMny+VytWk/mvroo480YsQIWa1Wff/73/dorZb8y7/8i2w2m4YMGaKjR496vJ5hGPrpT3+qESNG6OGHH9a5c+c8XvNGdfv27aupU6fqwQcf1KZNmzxWOycnR0lJSZowYYK++93v6vjx4/rDH/6ghx56SEOHDtWuXbs8WrvxP1KN/xkaOHCgnnzyST300ENatmyZx2o3+vnPf678/HxJ0u7du7VgwQKP1/w8b73e0vVz7S9/+Yvi4uI0YsQIfe973/NobUnq27evnE6nTp065W7z5PhvdY796U9/ks1mU0xMjJYuXdpmtRMTEzV27FjFxMSoqKioxbHGxcVp7ty5GjVqVJvUbVp/1KhR7r/t1157TaNGjdKQIUP0ySef6Je//KVsNpuGDx+uoqKiNq3dVEtj/uEPfyir1aq4uDiVlJS0ec2DBw/qoYce0ogRI7Ro0SKP7rsbGYahWbNmyWq1asSIEcrNzVVsbKwsFov7gH7RokVKS0tTcnKybDabPvvss69Us6U51tJ+s+kc89R+/fOveUvHL570wgsv6LXXXlNcXJxWrlzp/n3v3bvX47Wllo9XPDnPvX2M+nmtvb8UFhZ6/CwHbx2f3myf+cEHH2jnzp364Q9/qLlz56qoqEijR4/WwoUL9eSTT3qkT01VVFR4dR/uD4YMGaInn3xSr7zyisdqZGZm6p577tG+ffu0e/durVixQidPnmz1MX/5y180f/58dezY0WP98ggffm0qviKHw2E888wzhmEYxgsvvGBMmTLFMJvNxtWrVw2n02mYzWbD6XR6rN7UqVONQYMGue9vvP3UU08Ze/bsMQzDMJYuXWo4HI4268PnXb161aivrzcMwzB+9rOfGbt37/ZYrabefvtt46mnnjJqa2sNwzCMd9991/jHf/xHj9d96623jF/84heGYRjGwYMHjVmzZjX7HXizbnh4uFFVVWVUVVUZQ4YM8Vjtt99+2xg5cqRhGIbxxz/+0Zg9e7bRv39/47PPPjOqqqo8Ov7G37NhGEZRUZExdepU41vf+pZRUlJiOJ1Oo1+/fh6r3aiwsNB44oknDMMwjClTphjFxcUer9no7bff9urrbRjXz7Xo6Gjj6aefNgzDMFwul0drf/DBB8Yjjzxi5ObmGjNmzDDeeustIyMjw6Pjv9U51vhe43K5jMGDBxuffvppm9S2WLHG5FQAAAlRSURBVCxGQ0ODceLECSMlJaXFsdpsNiM7O/sr12up/sMPP2wYhmG8+OKLxvjx4w3DMIzf/e53xqpVq4wpU6YYhmEYZWVlxrhx4zxSv6X5XVdXZwwbNsxoaGgwDMMz8y4jI8PYsWOHYRiGe3/tqX13ozfffNN48skn3T+npKQYJ06cMBoaGoyEhATjgw8+MBYuXGgsXrzYMAzDmDt3rvHmm29+pZqfn2Njx45tcb/ZdI55ar/e9DV3uVwtHr94UuPfenl5uTFq1CijoaHB+Otf/2rYbDav1P386+7peX6rx6hTp041ioqK2rS2YbT+/rJu3TrjkUceafOaTX1+Hr/00kseqXkr+8ymr3Ftba37dz5mzBjj1KlTbdqfDz74wOjcubNhs9kMm81m/O53v/PaPvxG9b3x9+0tP/rRjwzDMIzvfOc7xpUrV9zta9euNRYvXuw+jjEMw6ipqXG/vxw+fNiIi4szYmNjjRUrVni9319FkK/DEnx5Z86c0aBBgyRJMTEx2rZtm06dOqXExERJ0uXLl3Xp0iXdfffdHql38ODBZvcb///FMidOnNA777yj5557Tp999pnS0tLapH5LPvnkEz3xxBOqrKzU+fPnNXDgQI/VasmKFSuUnZ0tSQoK8vyf04kTJ/TGG29o//79MgxDPXr08HjNG9W95557FBoaKkkeP6PlwQcflCT16NFDly9fVs+ePdWhQwd16NBBwcHBcjqdHnn9TSaT+3bj/I6IiFCvXr0kSR06dGjzmp83cOBAnThxQlVVVSotLVV0dLTHazZVU1Pjtddbun6uDRo0SHfeeacmT56sAQMGeOXa69jYWD377LO6cOGCLl265NHx3+ocKyws1OLFi1VfX6+SkhJ9/PHH7m2+igEDBshkMqlv3746efKkoqOjrxurdO093xP69+8vSerevbv79t///d/r7Nmzys/Pd1/z66mvOG9pfptMJs2aNUtpaWn65je/qSVLluiuu+5q07qzZs3S888/r02bNmnUqFEe3Xc3Ki4ubnba98WLF92nAw8cOFBnz56VdG1OSNfebysrK79y3aZz7MKFCzfcbzbOMU/t15u+5pMnT252n+HFL8Y7e/as/ud//kcjRoyQJF26dMkrdT//ugcHB3t0nt/qMaon3ej95cMPP/RoXen6eRwZGemROl90n3nhwgVlZmbKMAyVlJTo/Pnz6tOnT5v2yWazacuWLZKunWGzZMkSr+7Dm9aXpP/6r//yeE1vaTwLrK6uTnfccYe7PSoqSocPH77h4+bNm6fXX39dERERGjt2rNLS0tS1a1eP97ctEFa0Y//wD/+gP//5z3rkkUd05MgRde7cWdHR0dq9e7dCQkJUX1+v4OBgj9WTrh1A1tTUSJL+93//V5IUHR2tCRMmyGq1SpLq6+vbrA+ft3nzZqWkpGjGjBn66U9/6tUDjk8++UQHDx5UXl6eCgsL9dRTT3m8ZnR0tOx2u37xi19IuvbaDhs27Lat26jpBzqn06kPP/xQV65cUV1dnerq6jz2wTkiIsJ9qc2xY8eu64u3pKSkaObMmT45hfIb3/iGDh065JXXW7p+rtXU1Ogb3/iGJCkxMVF2u109e/b0WP1Gs2fP1oIFCzR+/Hjl5+d7bPy3OseWL1+u//iP/9A999yjgQMHttl73dGjR2UYhk6dOqXo6Ogb/m0FBHjmqtGmY216++rVq7LZbFq7dq0kz+1HWprfJpNJdrtdkydP1i9/+Uu9/vrrmjJlSpvWDQsL07//+7+rrq5OgwYN8ui+u1Hfvn2VnZ3tPv29S5cuKi4uVnR0tN59913NnDlTubm5LQZoX0XTOXb33XfrT3/6U4v7zcY55qn9+udf806dOl13/OJJwcHBcrlcuueee9S/f39t375dJpPJo8dIjVo6XnG5XB6d594+Rm3Jjd5fvHGs+Pl53KtXL49cznYr+8zGuSdJq1at0tSpUzVixAglJyd7/LWor693L2bqzX347cgwDL3yyit6/PHHFRISoqtXr7oDi3Pnzql79+43nOfvvfeeJkyYIEmqrKxUaWkpYQU8b/z48frDH/6g+Ph4ffvb31ZAQIAyMjKUkJCggIAAdenSRVlZWR6rJ8l9/euQIUPUvXt3SdKCBQv0+OOPu9+cli9f7rHrQePj45WWlqa33nrL69dgRUREKDIyUnFxcRo6dKhXao4dO1Z79+7ViBEjZDKZrvvv0O1WtyWBgYGaN2+ehg8froCAAD3//PMeq/XAAw/o008/VUJCgu6//36P1bmZyZMnKyMjQ6tWrfJ6bW++3tL1c81msyk7O1sBAQGKiopSVFSUR+s37ce8efM8Pv5bnWOPPPKIJkyYoAceeMB9INoWwsLCNHbsWH300Udat26djh8/7rXfdWs6deqkPn36yGazKSAgQAkJCZo/f36b12np91tTU6Pvfe97MplMMplMHlmT58UXX9Trr78up9OpadOm6f777/fYvrvR2LFjtXPnTsXGxio4OFiLFi3SjBkzZBiGxowZo969e7d5Ten6OZaRkdHqftNT+/XPv+ZdunS57vjFkx544AE988wzeuKJJ/SDH/xANptNgYGBeuCBB/Sv//qvHq3d0vGKp+e5t49R/Y23jk9vZZ85evRozZ49Ww8//LBGjx6tn/zkJ+rbt6/HQuh9+/a5z4oLCAhQfX29V/fhTevfd999Hq/nLStWrNDIkSMlSRMnTtSqVas0d+5c1dbWau3atdqwYYPCw8NVVlYm6W//AJGk7373u9qyZYvCwsLkcrk89rv3BJPhzX9F47Zx/Phx/frXv9bLL7/s664At72LFy9q5syZ2rp1q6+7gttITk6Otm/frl//+te+7gpuU8wxAPjqTpw4odmzZ2vnzp0KCAjQZ599pp///OcqLi7WqVOn9MILL2jixImSpJ/85CcqKiqSzWZTXl6ecnJyVFhYqLlz56qhoUF33HGH3njjjXaz0CZnVgCAHztw4ICefvppDvYBAAC+hoqLixUbG6uAgAB9+OGH6tWrl1566SVJ0ksvvaRt27a5w4o1a9Zc9/hBgwZpz549Xu1zW+HMCgAAAAAA/NBf//pXTZgwQSaTST179nSv5/R1QFgBAAAAAAD8SvtZXQMAAAAAAHwtEFYAAAAAAAC/QlgBAAAAAAD8CmEFAAAAAADwK/8Hcc0DBrnBrr8AAAAASUVORK5CYII=
" />
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<details class="description">
<summary class="btn btn-sm" data-open="Hide Code" data-close="Show Code"></summary>
<p><div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="k">def</span> <span class="nf">plot_hist</span><span class="p">(</span><span class="n">lens</span><span class="p">,</span> <span class="n">n_bins</span> <span class="o">=</span> <span class="mi">50</span><span class="p">):</span>
<span class="n">n</span><span class="p">,</span> <span class="n">bins</span><span class="p">,</span> <span class="n">patches</span> <span class="o">=</span> <span class="n">plt</span><span class="o">.</span><span class="n">hist</span><span class="p">(</span><span class="n">lens</span><span class="p">,</span> <span class="n">n_bins</span><span class="p">,</span> <span class="n">facecolor</span><span class="o">=</span><span class="s1">'blue'</span><span class="p">,</span> <span class="n">alpha</span><span class="o">=</span><span class="mf">0.9</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
</pre></div>
</div>
</div>
</div>
</p>
</details>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">'Mean: </span><span class="si">{</span><span class="n">mean</span><span class="p">(</span><span class="n">lens</span><span class="p">)</span><span class="si">}</span><span class="s1">, Median: </span><span class="si">{</span><span class="n">median</span><span class="p">(</span><span class="n">lens</span><span class="p">)</span><span class="si">}</span><span class="s1">, Standard Deviation: </span><span class="si">{</span><span class="n">stdev</span><span class="p">(</span><span class="n">lens</span><span class="p">)</span><span class="si">}</span><span class="s1">, 90th Percentile: </span><span class="si">{</span><span class="n">np</span><span class="o">.</span><span class="n">percentile</span><span class="p">(</span><span class="n">lens</span><span class="p">,</span> <span class="mi">100</span><span class="p">)</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
<span class="n">plot_hist</span><span class="p">(</span><span class="n">lens</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_subarea output_stream output_stdout output_text">
<pre>Mean: 150.01203744984397, Median: 141.0, Standard Deviation: 44.59412209701778, 90th Percentile: 513.0
</pre>
</div>
</div>
<div class="output_area">
<div class="output_png output_subarea ">
<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAX0AAAD4CAYAAAAAczaOAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+j8jraAAAPGklEQVR4nO3db6jcV53H8ffHVFt33W36525pk7C3YkAqrFUuNaIP3Bbb2BXTB0UqsgYJ5EkXKgiu3YUt/nmgT6wKq2yxwShi7apLQxG62bSw7APb3tham2ZLr6vSJNVEk9YVoWzqdx/MSRnivbn3JnNnbu55v2CY3+/8zsw9v0PmMydnzvwmVYUkqQ+vmXQDJEnjY+hLUkcMfUnqiKEvSR0x9CWpIxdMugFncvnll9f09PSkmyFJ55X9+/f/uqqm5ju2qkN/enqa2dnZSTdDks4rSX6x0DGndySpI4a+JHXE0Jekjhj6ktQRQ1+SOmLoS1JHDH1J6oihL0kdMfQlqSOr+hu5vbvqqvnLjxwZbzskrR2G/iqwULhL0qg5vSNJHTH0Jakjhr4kdcTQl6SOGPqS1BFDX5I6YuhLUkcMfUnqiKEvSR0x9CWpI4a+JHXE0Jekjhj6ktQRr7J5HvKSy5LOliN9SeqIoS9JHTH0Jakjhr4kdcTQl6SOLDn0k6xL8kSSB9v+1UkeTTKX5DtJXtfKL2z7c+349NBz3NnKn01y06hPRpJ0ZssZ6d8BHBza/zxwd1W9CTgB7GjlO4ATrfzuVo8k1wC3AW8BtgJfSbLu3JovSVqOJYV+ko3A3wBfa/sBrge+26rsBm5p29vaPu34Da3+NuC+qnq5qn4GzAHXjeIkJElLs9SR/heBTwB/aPuXAS9W1cm2fwjY0LY3AM8DtOMvtfqvls/zmFcl2ZlkNsnssWPHlnEqkqTFLBr6Sd4PHK2q/WNoD1V1T1XNVNXM1NTUOP6kJHVjKZdheBfwgSQ3AxcBfw58CVif5II2mt8IHG71DwObgENJLgAuBn4zVH7K8GMkSWOw6Ei/qu6sqo1VNc3gg9iHq+rDwCPAra3aduCBtr2n7dOOP1xV1cpva6t7rgY2A4+N7EwkSYs6lwuu/T1wX5LPAk8A97bye4FvJpkDjjN4o6CqDiS5H3gGOAncXlWvnMPflyQtUwaD8NVpZmamZmdnJ92MFbfQVTOXy6tsSgJIsr+qZuY75jdyJakjhr4kdcTQl6SOGPqS1BFDX5I64m/kriFnWgXkyh5J4Ehfkrpi6EtSRwx9SeqIoS9JHTH0Jakjrt7pxEIre1zVI/XFkb4kdcTQl6SOGPqS1BFDX5I6YuhLUkcMfUnqiKEvSR0x9CWpI4a+JHXE0Jekjhj6ktQRQ1+SOmLoS1JHvMrmGJ3pN2wlaRwc6UtSRwx9SeqIoS9JHTH0Jakjhr4kdcTQl6SOGPqS1BFDX5I6YuhLUkcMfUnqiKEvSR1ZNPSTXJTksSQ/TnIgyada+dVJHk0yl+Q7SV7Xyi9s+3Pt+PTQc93Zyp9NctNKnZQkaX5LGem/DFxfVW8FrgW2JtkCfB64u6reBJwAdrT6O4ATrfzuVo8k1wC3AW8BtgJfSbJulCcjSTqzRUO/Bn7Xdl/bbgVcD3y3le8Gbmnb29o+7fgNSdLK76uql6vqZ8AccN1IzkKStCRLmtNPsi7Jk8BRYC/wU+DFqjrZqhwCNrTtDcDzAO34S8Blw+XzPGb4b+1MMptk9tixY8s/I0nSgpYU+lX1SlVdC2xkMDp/80o1qKruqaqZqpqZmppaqT8jSV1a1uqdqnoReAR4J7A+yakfYdkIHG7bh4FNAO34xcBvhsvneYwkaQyWsnpnKsn6tv164L3AQQbhf2urth14oG3vafu04w9XVbXy29rqnquBzcBjozoRSdLilvJziVcCu9tKm9cA91fVg0meAe5L8lngCeDeVv9e4JtJ5oDjDFbsUFUHktwPPAOcBG6vqldGezqSpDPJYBC+Os3MzNTs7OykmzEyq/E3co8cmXQLJI1akv1VNTPfMb+RK0kdMfQlqSOGviR1xNCXpI4Y+pLUEUNfkjpi6EtSRwx9SeqIoS9JHTH0Jakjhr4kdcTQl6SOGPqS1BFDX5I6YuhLUkcMfUnqiKEvSR0x9CWpI0v5jVytYQv9hKM/oyitTY70Jakjhr4kdcTQl6SOGPqS1BFDX5I6YuhLUkcMfUnqiKEvSR0x9CWpI4a+JHXEyzCsgIUubSBJk+ZIX5I6YuhLUkcMfUnqiKEvSR0x9CWpI67e0bz8cRVpbVp0pJ9kU5JHkjyT5ECSO1r5pUn2Jnmu3V/SypPky0nmkjyV5O1Dz7W91X8uyfaVOy1J0nyWMr1zEvh4VV0DbAFuT3IN8ElgX1VtBva1fYD3AZvbbSfwVRi8SQB3Ae8ArgPuOvVGIUkaj0VDv6peqKofte3/BQ4CG4BtwO5WbTdwS9veBnyjBn4IrE9yJXATsLeqjlfVCWAvsHWkZyNJOqNlfZCbZBp4G/AocEVVvdAO/RK4om1vAJ4fetihVrZQ+el/Y2eS2SSzx44dW07zJEmLWHLoJ3kD8D3gY1X12+FjVVVAjaJBVXVPVc1U1czU1NQonlKS1Cwp9JO8lkHgf6uqvt+Kf9WmbWj3R1v5YWDT0MM3trKFyiVJY7KU1TsB7gUOVtUXhg7tAU6twNkOPDBU/pG2imcL8FKbBnoIuDHJJe0D3BtbmSRpTJayTv9dwN8CP0nyZCv7B+BzwP1JdgC/AD7Yjv0AuBmYA34PfBSgqo4n+QzweKv36ao6PpKzkCQtyaKhX1X/BWSBwzfMU7+A2xd4rl3AruU0UJI0Ol6GQZI6YuhLUkcMfUnqiKEvSR0x9CWpI4a+JHXE0Jekjhj6ktQRQ1+SOmLoS1JHDH1J6oihL0kdWcpVNqVXXXXV/OVHjoy3HZLOjiN9SeqIoS9JHTH0Jakjhr4kdcTQl6SOGPqS1BFDX5I64jp9jYTr96XzgyN9SeqIoS9JHTH0Jakjhr4kdcTQl6SOGPqS1BFDX5I6YuhLUkcMfUnqiN/IPQcLfQtVklYrR/qS1BFDX5I6YuhLUkcMfUnqiKEvSR0x9CWpI4uGfpJdSY4meXqo7NIke5M81+4vaeVJ8uUkc0meSvL2ocdsb/WfS7J9ZU5HknQmSxnpfx3YelrZJ4F9VbUZ2Nf2Ad4HbG63ncBXYfAmAdwFvAO4Drjr1BuFJGl8Fg39qvpP4PhpxduA3W17N3DLUPk3auCHwPokVwI3AXur6nhVnQD28sdvJJKkFXa2c/pXVNULbfuXwBVtewPw/FC9Q61sofI/kmRnktkks8eOHTvL5kmS5nPOH+RWVQE1gracer57qmqmqmampqZG9bSSJM4+9H/Vpm1o90db+WFg01C9ja1soXJJ0hidbejvAU6twNkOPDBU/pG2imcL8FKbBnoIuDHJJe0D3Btbmda4q66a/yZpMha9ymaSbwPvAS5PcojBKpzPAfcn2QH8Avhgq/4D4GZgDvg98FGAqjqe5DPA463ep6vq9A+HJUkrbNHQr6oPLXDohnnqFnD7As+zC9i1rNZJkkbKb+RKUkcMfUnqiKEvSR0x9CWpI4a+JHXE0Jekjiy6ZFNaCQt9QevIkfG2Q+qNI31J6oihL0kdMfQlqSOGviR1xNCXpI4Y+pLUEUNfkjriOv0l8Ec/xsf1+9LKcqQvSR0x9CWpI4a+JHXE0JekjvhBrs4LfsArjYYjfUnqiKEvSR0x9CWpI87p67zmXL+0PI70Jakjhr4kdcTQl6SOOKevNcm5fml+hr66cqYrpvqGoB44vSNJHXGkLzVOCakHjvQlqSOGviR1xOmdIf4soqS1ztCXFjGqwYCfDWg1MPSlMfGDYq0GzulLUkfGPtJPshX4ErAO+FpVfW7cbZBWE6ePNE5jDf0k64B/Bt4LHAIeT7Knqp4ZZzuktWi500d+O7lP4x7pXwfMVdX/ACS5D9gGrEjouxpHOrvXwfny2vHNafnGHfobgOeH9g8B7xiukGQnsLPt/i7Js2Nq20q4HPj1pBuxCtgPA/bDwMj6IRnFs0zESv9b+MuFDqy61TtVdQ9wz6TbMQpJZqtqZtLtmDT7YcB+GLAfJtsH4169cxjYNLS/sZVJksZg3KH/OLA5ydVJXgfcBuwZcxskqVtjnd6pqpNJ/g54iMGSzV1VdWCcbRizNTFNNQL2w4D9MGA/TLAPUlWT+tuSpDHzG7mS1BFDX5I6YuifgyS7khxN8vRQ2aVJ9iZ5rt1f0sqT5MtJ5pI8leTtk2v56CTZlOSRJM8kOZDkjlbeWz9clOSxJD9u/fCpVn51kkfb+X6nLWAgyYVtf64dn55k+0ctybokTyR5sO131w9Jfp7kJ0meTDLbyib+ujD0z83Xga2nlX0S2FdVm4F9bR/gfcDmdtsJfHVMbVxpJ4GPV9U1wBbg9iTX0F8/vAxcX1VvBa4FtibZAnweuLuq3gScAHa0+juAE6387lZvLbkDODi032s//HVVXTu0Jn/yr4uq8nYON2AaeHpo/1ngyrZ9JfBs2/4X4EPz1VtLN+ABBtdW6rYfgD8BfsTg2+a/Bi5o5e8EHmrbDwHvbNsXtHqZdNtHdP4bGQTa9cCDQDrth58Dl59WNvHXhSP90buiql5o278Ermjb812CYsM4G7bS2n/N3wY8Sof90KY0ngSOAnuBnwIvVtXJVmX4XF/th3b8JeCy8bZ4xXwR+ATwh7Z/GX32QwH/nmR/u7wMrILXxaq7DMNaUlWVpIs1sUneAHwP+FhV/TZDF0XppR+q6hXg2iTrgX8D3jzhJo1dkvcDR6tqf5L3TLo9E/buqjqc5C+AvUn+e/jgpF4XjvRH71dJrgRo90db+Zq9BEWS1zII/G9V1fdbcXf9cEpVvQg8wmAaY32SU4Or4XN9tR/a8YuB34y5qSvhXcAHkvwcuI/BFM+X6K8fqKrD7f4og0HAdayC14WhP3p7gO1tezuDOe5T5R9pn9JvAV4a+m/eeSuDIf29wMGq+sLQod76YaqN8EnyegafaxxkEP63tmqn98Op/rkVeLjaZO75rKrurKqNVTXN4DIrD1fVh+msH5L8aZI/O7UN3Ag8zWp4XUz6w47z+QZ8G3gB+D8Gc3A7GMxH7gOeA/4DuLTVDYMfkPkp8BNgZtLtH1EfvJvB3OVTwJPtdnOH/fBXwBOtH54G/qmVvxF4DJgD/hW4sJVf1Pbn2vE3TvocVqBP3gM82GM/tPP9cbsdAP6xlU/8deFlGCSpI07vSFJHDH1J6oihL0kdMfQlqSOGviR1xNCXpI4Y+pLUkf8H9haDbpr4hOQAAAAASUVORK5CYII=
" />
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>Let's get our data into a format that we can feed into our model using Pytorch's Dataset and Dataloader API. All these methods do are convert our dataframes where we have multiple historical dialog, i.e., context, and a response, into a single conversation string that is separated a special token that tells our model when a person is finished speaking.</p>
<p>These conversation strings are then tokenized using HuggingFace's awesome tokenizers into their numerical representation that our model actual understands!</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<details class="description">
<summary class="btn btn-sm" data-open="Hide Code" data-close="Show Code"></summary>
<p><div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="k">def</span> <span class="nf">construct_conv</span><span class="p">(</span><span class="n">row</span><span class="p">,</span> <span class="n">tokenizer</span><span class="p">,</span> <span class="n">eos</span> <span class="o">=</span> <span class="kc">True</span><span class="p">):</span>
<span class="c1"># from: https://stackoverflow.com/questions/952914/how-to-make-a-flat-list-out-of-list-of-lists</span>
<span class="n">flatten</span> <span class="o">=</span> <span class="k">lambda</span> <span class="n">l</span><span class="p">:</span> <span class="p">[</span><span class="n">item</span> <span class="k">for</span> <span class="n">sublist</span> <span class="ow">in</span> <span class="n">l</span> <span class="k">for</span> <span class="n">item</span> <span class="ow">in</span> <span class="n">sublist</span><span class="p">]</span>
<span class="n">conv</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="nb">reversed</span><span class="p">([</span><span class="n">tokenizer</span><span class="o">.</span><span class="n">encode</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="o">+</span> <span class="p">[</span><span class="n">tokenizer</span><span class="o">.</span><span class="n">eos_token_id</span><span class="p">]</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">row</span><span class="p">]))</span>
<span class="n">conv</span> <span class="o">=</span> <span class="n">flatten</span><span class="p">(</span><span class="n">conv</span><span class="p">)</span>
<span class="k">return</span> <span class="n">conv</span>
<span class="k">class</span> <span class="nc">ConversationDataset</span><span class="p">(</span><span class="n">Dataset</span><span class="p">):</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">tokenizer</span><span class="p">:</span> <span class="n">PreTrainedTokenizer</span><span class="p">,</span> <span class="n">args</span><span class="p">,</span> <span class="n">df</span><span class="p">,</span> <span class="n">block_size</span><span class="o">=</span><span class="mi">512</span><span class="p">):</span>
<span class="n">block_size</span> <span class="o">=</span> <span class="n">block_size</span> <span class="o">-</span> <span class="p">(</span><span class="n">tokenizer</span><span class="o">.</span><span class="n">max_len</span> <span class="o">-</span> <span class="n">tokenizer</span><span class="o">.</span><span class="n">max_len_single_sentence</span><span class="p">)</span>
<span class="n">directory</span> <span class="o">=</span> <span class="n">args</span><span class="o">.</span><span class="n">cache_dir</span>
<span class="n">cached_features_file</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span>
<span class="n">directory</span><span class="p">,</span> <span class="n">args</span><span class="o">.</span><span class="n">model_type</span> <span class="o">+</span> <span class="s2">"_cached_lm_"</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">block_size</span><span class="p">)</span>
<span class="p">)</span>
<span class="k">if</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">exists</span><span class="p">(</span><span class="n">cached_features_file</span><span class="p">)</span> <span class="ow">and</span> <span class="ow">not</span> <span class="n">args</span><span class="o">.</span><span class="n">overwrite_cache</span><span class="p">:</span>
<span class="n">logger</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="s2">"Loading features from cached file </span><span class="si">%s</span><span class="s2">"</span><span class="p">,</span> <span class="n">cached_features_file</span><span class="p">)</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">cached_features_file</span><span class="p">,</span> <span class="s2">"rb"</span><span class="p">)</span> <span class="k">as</span> <span class="n">handle</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">examples</span> <span class="o">=</span> <span class="n">pickle</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">handle</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">logger</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="s2">"Creating features from dataset file at </span><span class="si">%s</span><span class="s2">"</span><span class="p">,</span> <span class="n">directory</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">examples</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">_</span><span class="p">,</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">df</span><span class="o">.</span><span class="n">iterrows</span><span class="p">():</span>
<span class="n">conv</span> <span class="o">=</span> <span class="n">construct_conv</span><span class="p">(</span><span class="n">row</span><span class="p">,</span> <span class="n">tokenizer</span><span class="p">)</span>
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">conv</span><span class="p">)</span> <span class="o">></span> <span class="n">block_size</span><span class="p">:</span> <span class="k">continue</span>
<span class="bp">self</span><span class="o">.</span><span class="n">examples</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">conv</span><span class="p">)</span>
<span class="c1"># Note that we are loosing the last truncated example here for the sake of simplicity (no padding)</span>
<span class="c1"># If your dataset is small, first you should loook for a bigger one :-) and second you</span>
<span class="c1"># can change this behavior by adding (model specific) padding.</span>
<span class="n">logger</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="s2">"Saving features into cached file </span><span class="si">%s</span><span class="s2">"</span><span class="p">,</span> <span class="n">cached_features_file</span><span class="p">)</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">cached_features_file</span><span class="p">,</span> <span class="s2">"wb"</span><span class="p">)</span> <span class="k">as</span> <span class="n">handle</span><span class="p">:</span>
<span class="n">pickle</span><span class="o">.</span><span class="n">dump</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">examples</span><span class="p">,</span> <span class="n">handle</span><span class="p">,</span> <span class="n">protocol</span><span class="o">=</span><span class="n">pickle</span><span class="o">.</span><span class="n">HIGHEST_PROTOCOL</span><span class="p">)</span>
<span class="k">def</span> <span class="fm">__len__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">examples</span><span class="p">)</span>
<span class="k">def</span> <span class="fm">__getitem__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">item</span><span class="p">):</span>
<span class="k">return</span> <span class="n">torch</span><span class="o">.</span><span class="n">tensor</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">examples</span><span class="p">[</span><span class="n">item</span><span class="p">],</span> <span class="n">dtype</span><span class="o">=</span><span class="n">torch</span><span class="o">.</span><span class="n">long</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
</p>
</details>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h1 id="Training-and-Evaluating">Training and Evaluating<a class="anchor-link" href="#Training-and-Evaluating"> </a></h1><p>Now that we have THE DATA we can finally create our model and start training it! The training and evaluation loop are quite simple. We simplely take a batch of examples from our dataloader and use it both as our inputs and labels. We do this because GPT2 is an auto-regressive model, meaning it uses some context to predict the next token. This prediction is then added to the original context and fed back in as the new context for generating the next token.</p>
<p>To evaluate our model, we use the metric perplexity, which is a simple, but powerful metric. Perplexity is a measure of how unsure the model is in its choice of the next token. The more unsure our model is, the higher its perplexity. One fascinating thing about perplexity is that it correlates very well with what humans think of when it comes to coherent and specific natural conversations, which was shown in the amazing paper <a href="https://arxiv.org/abs/2001.09977">"Towards a Human-like Open-Domain Chatbot"</a> by Daniel Adiwardana, et. al.</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<details class="description">
<summary class="btn btn-sm" data-open="Hide Code" data-close="Show Code"></summary>
<p><div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="c1"># Training of model</span>
<span class="k">def</span> <span class="nf">train</span><span class="p">(</span><span class="n">args</span><span class="p">,</span> <span class="n">train_dataset</span><span class="p">,</span> <span class="n">model</span><span class="p">:</span> <span class="n">PreTrainedModel</span><span class="p">,</span> <span class="n">tokenizer</span><span class="p">:</span> <span class="n">PreTrainedTokenizer</span><span class="p">)</span> <span class="o">-></span> <span class="n">Tuple</span><span class="p">[</span><span class="nb">int</span><span class="p">,</span> <span class="nb">float</span><span class="p">]:</span>
<span class="sd">""" Train the model """</span>
<span class="k">if</span> <span class="n">args</span><span class="o">.</span><span class="n">local_rank</span> <span class="ow">in</span> <span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">]:</span>
<span class="n">tb_writer</span> <span class="o">=</span> <span class="n">SummaryWriter</span><span class="p">()</span>
<span class="n">args</span><span class="o">.</span><span class="n">train_batch_size</span> <span class="o">=</span> <span class="n">args</span><span class="o">.</span><span class="n">per_gpu_train_batch_size</span> <span class="o">*</span> <span class="nb">max</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">args</span><span class="o">.</span><span class="n">n_gpu</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">collate</span><span class="p">(</span><span class="n">examples</span><span class="p">:</span> <span class="n">List</span><span class="p">[</span><span class="n">torch</span><span class="o">.</span><span class="n">Tensor</span><span class="p">]):</span>
<span class="k">if</span> <span class="n">tokenizer</span><span class="o">.</span><span class="n">_pad_token</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
<span class="k">return</span> <span class="n">pad_sequence</span><span class="p">(</span><span class="n">examples</span><span class="p">,</span> <span class="n">batch_first</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="k">return</span> <span class="n">pad_sequence</span><span class="p">(</span><span class="n">examples</span><span class="p">,</span> <span class="n">batch_first</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">padding_value</span><span class="o">=</span><span class="n">tokenizer</span><span class="o">.</span><span class="n">pad_token_id</span><span class="p">)</span>
<span class="n">train_sampler</span> <span class="o">=</span> <span class="n">RandomSampler</span><span class="p">(</span><span class="n">train_dataset</span><span class="p">)</span> <span class="k">if</span> <span class="n">args</span><span class="o">.</span><span class="n">local_rank</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span> <span class="k">else</span> <span class="n">DistributedSampler</span><span class="p">(</span><span class="n">train_dataset</span><span class="p">)</span>
<span class="n">train_dataloader</span> <span class="o">=</span> <span class="n">DataLoader</span><span class="p">(</span>
<span class="n">train_dataset</span><span class="p">,</span> <span class="n">sampler</span><span class="o">=</span><span class="n">train_sampler</span><span class="p">,</span> <span class="n">batch_size</span><span class="o">=</span><span class="n">args</span><span class="o">.</span><span class="n">train_batch_size</span><span class="p">,</span> <span class="n">collate_fn</span><span class="o">=</span><span class="n">collate</span><span class="p">,</span> <span class="n">drop_last</span> <span class="o">=</span> <span class="kc">True</span>
<span class="p">)</span>
<span class="k">if</span> <span class="n">args</span><span class="o">.</span><span class="n">max_steps</span> <span class="o">></span> <span class="mi">0</span><span class="p">:</span>
<span class="n">t_total</span> <span class="o">=</span> <span class="n">args</span><span class="o">.</span><span class="n">max_steps</span>
<span class="n">args</span><span class="o">.</span><span class="n">num_train_epochs</span> <span class="o">=</span> <span class="n">args</span><span class="o">.</span><span class="n">max_steps</span> <span class="o">//</span> <span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">train_dataloader</span><span class="p">)</span> <span class="o">//</span> <span class="n">args</span><span class="o">.</span><span class="n">gradient_accumulation_steps</span><span class="p">)</span> <span class="o">+</span> <span class="mi">1</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">t_total</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">train_dataloader</span><span class="p">)</span> <span class="o">//</span> <span class="n">args</span><span class="o">.</span><span class="n">gradient_accumulation_steps</span> <span class="o">*</span> <span class="n">args</span><span class="o">.</span><span class="n">num_train_epochs</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">module</span> <span class="k">if</span> <span class="nb">hasattr</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="s2">"module"</span><span class="p">)</span> <span class="k">else</span> <span class="n">model</span> <span class="c1"># Take care of distributed/parallel training</span>
<span class="n">model</span><span class="o">.</span><span class="n">resize_token_embeddings</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">tokenizer</span><span class="p">))</span>
<span class="c1"># add_special_tokens_(model, tokenizer)</span>
<span class="c1"># Prepare optimizer and schedule (linear warmup and decay)</span>
<span class="n">no_decay</span> <span class="o">=</span> <span class="p">[</span><span class="s2">"bias"</span><span class="p">,</span> <span class="s2">"LayerNorm.weight"</span><span class="p">]</span>
<span class="n">optimizer_grouped_parameters</span> <span class="o">=</span> <span class="p">[</span>
<span class="p">{</span>
<span class="s2">"params"</span><span class="p">:</span> <span class="p">[</span><span class="n">p</span> <span class="k">for</span> <span class="n">n</span><span class="p">,</span> <span class="n">p</span> <span class="ow">in</span> <span class="n">model</span><span class="o">.</span><span class="n">named_parameters</span><span class="p">()</span> <span class="k">if</span> <span class="ow">not</span> <span class="nb">any</span><span class="p">(</span><span class="n">nd</span> <span class="ow">in</span> <span class="n">n</span> <span class="k">for</span> <span class="n">nd</span> <span class="ow">in</span> <span class="n">no_decay</span><span class="p">)],</span>
<span class="s2">"weight_decay"</span><span class="p">:</span> <span class="n">args</span><span class="o">.</span><span class="n">weight_decay</span><span class="p">,</span>
<span class="p">},</span>
<span class="p">{</span><span class="s2">"params"</span><span class="p">:</span> <span class="p">[</span><span class="n">p</span> <span class="k">for</span> <span class="n">n</span><span class="p">,</span> <span class="n">p</span> <span class="ow">in</span> <span class="n">model</span><span class="o">.</span><span class="n">named_parameters</span><span class="p">()</span> <span class="k">if</span> <span class="nb">any</span><span class="p">(</span><span class="n">nd</span> <span class="ow">in</span> <span class="n">n</span> <span class="k">for</span> <span class="n">nd</span> <span class="ow">in</span> <span class="n">no_decay</span><span class="p">)],</span> <span class="s2">"weight_decay"</span><span class="p">:</span> <span class="mf">0.0</span><span class="p">},</span>
<span class="p">]</span>
<span class="n">optimizer</span> <span class="o">=</span> <span class="n">AdamW</span><span class="p">(</span><span class="n">optimizer_grouped_parameters</span><span class="p">,</span> <span class="n">lr</span><span class="o">=</span><span class="n">args</span><span class="o">.</span><span class="n">learning_rate</span><span class="p">,</span> <span class="n">eps</span><span class="o">=</span><span class="n">args</span><span class="o">.</span><span class="n">adam_epsilon</span><span class="p">)</span>
<span class="n">scheduler</span> <span class="o">=</span> <span class="n">get_linear_schedule_with_warmup</span><span class="p">(</span>
<span class="n">optimizer</span><span class="p">,</span> <span class="n">num_warmup_steps</span><span class="o">=</span><span class="n">args</span><span class="o">.</span><span class="n">warmup_steps</span><span class="p">,</span> <span class="n">num_training_steps</span><span class="o">=</span><span class="n">t_total</span>
<span class="p">)</span>
<span class="c1"># Check if saved optimizer or scheduler states exist</span>
<span class="k">if</span> <span class="p">(</span>
<span class="n">args</span><span class="o">.</span><span class="n">model_name_or_path</span>
<span class="ow">and</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">isfile</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">args</span><span class="o">.</span><span class="n">model_name_or_path</span><span class="p">,</span> <span class="s2">"optimizer.pt"</span><span class="p">))</span>
<span class="ow">and</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">isfile</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">args</span><span class="o">.</span><span class="n">model_name_or_path</span><span class="p">,</span> <span class="s2">"scheduler.pt"</span><span class="p">))</span>
<span class="p">):</span>
<span class="c1"># Load in optimizer and scheduler states</span>
<span class="n">optimizer</span><span class="o">.</span><span class="n">load_state_dict</span><span class="p">(</span><span class="n">torch</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">args</span><span class="o">.</span><span class="n">model_name_or_path</span><span class="p">,</span> <span class="s2">"optimizer.pt"</span><span class="p">)))</span>
<span class="n">scheduler</span><span class="o">.</span><span class="n">load_state_dict</span><span class="p">(</span><span class="n">torch</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">args</span><span class="o">.</span><span class="n">model_name_or_path</span><span class="p">,</span> <span class="s2">"scheduler.pt"</span><span class="p">)))</span>
<span class="k">if</span> <span class="n">args</span><span class="o">.</span><span class="n">fp16</span><span class="p">:</span>
<span class="k">try</span><span class="p">:</span>
<span class="kn">from</span> <span class="nn">apex</span> <span class="kn">import</span> <span class="n">amp</span>
<span class="k">except</span> <span class="ne">ImportError</span><span class="p">:</span>
<span class="k">raise</span> <span class="ne">ImportError</span><span class="p">(</span><span class="s2">"Please install apex from https://www.github.com/nvidia/apex to use fp16 training."</span><span class="p">)</span>
<span class="n">model</span><span class="p">,</span> <span class="n">optimizer</span> <span class="o">=</span> <span class="n">amp</span><span class="o">.</span><span class="n">initialize</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="n">optimizer</span><span class="p">,</span> <span class="n">opt_level</span><span class="o">=</span><span class="n">args</span><span class="o">.</span><span class="n">fp16_opt_level</span><span class="p">)</span>
<span class="c1"># multi-gpu training (should be after apex fp16 initialization)</span>
<span class="k">if</span> <span class="n">args</span><span class="o">.</span><span class="n">n_gpu</span> <span class="o">></span> <span class="mi">1</span><span class="p">:</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">nn</span><span class="o">.</span><span class="n">DataParallel</span><span class="p">(</span><span class="n">model</span><span class="p">)</span>
<span class="c1"># Distributed training (should be after apex fp16 initialization)</span>
<span class="k">if</span> <span class="n">args</span><span class="o">.</span><span class="n">local_rank</span> <span class="o">!=</span> <span class="o">-</span><span class="mi">1</span><span class="p">:</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">nn</span><span class="o">.</span><span class="n">parallel</span><span class="o">.</span><span class="n">DistributedDataParallel</span><span class="p">(</span>
<span class="n">model</span><span class="p">,</span> <span class="n">device_ids</span><span class="o">=</span><span class="p">[</span><span class="n">args</span><span class="o">.</span><span class="n">local_rank</span><span class="p">],</span> <span class="n">output_device</span><span class="o">=</span><span class="n">args</span><span class="o">.</span><span class="n">local_rank</span><span class="p">,</span> <span class="n">find_unused_parameters</span><span class="o">=</span><span class="kc">True</span>
<span class="p">)</span>
<span class="c1"># Train!</span>
<span class="n">logger</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="s2">"***** Running training *****"</span><span class="p">)</span>
<span class="n">logger</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="s2">" Num examples = </span><span class="si">%d</span><span class="s2">"</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">train_dataset</span><span class="p">))</span>
<span class="n">logger</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="s2">" Num Epochs = </span><span class="si">%d</span><span class="s2">"</span><span class="p">,</span> <span class="n">args</span><span class="o">.</span><span class="n">num_train_epochs</span><span class="p">)</span>
<span class="n">logger</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="s2">" Instantaneous batch size per GPU = </span><span class="si">%d</span><span class="s2">"</span><span class="p">,</span> <span class="n">args</span><span class="o">.</span><span class="n">per_gpu_train_batch_size</span><span class="p">)</span>
<span class="n">logger</span><span class="o">.</span><span class="n">info</span><span class="p">(</span>
<span class="s2">" Total train batch size (w. parallel, distributed & accumulation) = </span><span class="si">%d</span><span class="s2">"</span><span class="p">,</span>
<span class="n">args</span><span class="o">.</span><span class="n">train_batch_size</span>
<span class="o">*</span> <span class="n">args</span><span class="o">.</span><span class="n">gradient_accumulation_steps</span>
<span class="o">*</span> <span class="p">(</span><span class="n">torch</span><span class="o">.</span><span class="n">distributed</span><span class="o">.</span><span class="n">get_world_size</span><span class="p">()</span> <span class="k">if</span> <span class="n">args</span><span class="o">.</span><span class="n">local_rank</span> <span class="o">!=</span> <span class="o">-</span><span class="mi">1</span> <span class="k">else</span> <span class="mi">1</span><span class="p">),</span>
<span class="p">)</span>
<span class="n">logger</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="s2">" Gradient Accumulation steps = </span><span class="si">%d</span><span class="s2">"</span><span class="p">,</span> <span class="n">args</span><span class="o">.</span><span class="n">gradient_accumulation_steps</span><span class="p">)</span>
<span class="n">logger</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="s2">" Total optimization steps = </span><span class="si">%d</span><span class="s2">"</span><span class="p">,</span> <span class="n">t_total</span><span class="p">)</span>
<span class="n">global_step</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">epochs_trained</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">steps_trained_in_current_epoch</span> <span class="o">=</span> <span class="mi">0</span>
<span class="c1"># Check if continuing training from a checkpoint</span>
<span class="k">if</span> <span class="n">args</span><span class="o">.</span><span class="n">model_name_or_path</span> <span class="ow">and</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">exists</span><span class="p">(</span><span class="n">args</span><span class="o">.</span><span class="n">model_name_or_path</span><span class="p">):</span>
<span class="k">try</span><span class="p">:</span>
<span class="c1"># set global_step to gobal_step of last saved checkpoint from model path</span>
<span class="n">checkpoint_suffix</span> <span class="o">=</span> <span class="n">args</span><span class="o">.</span><span class="n">model_name_or_path</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s2">"-"</span><span class="p">)[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s2">"/"</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span>
<span class="n">global_step</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">checkpoint_suffix</span><span class="p">)</span>
<span class="n">epochs_trained</span> <span class="o">=</span> <span class="n">global_step</span> <span class="o">//</span> <span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">train_dataloader</span><span class="p">)</span> <span class="o">//</span> <span class="n">args</span><span class="o">.</span><span class="n">gradient_accumulation_steps</span><span class="p">)</span>
<span class="n">steps_trained_in_current_epoch</span> <span class="o">=</span> <span class="n">global_step</span> <span class="o">%</span> <span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">train_dataloader</span><span class="p">)</span> <span class="o">//</span> <span class="n">args</span><span class="o">.</span><span class="n">gradient_accumulation_steps</span><span class="p">)</span>
<span class="n">logger</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="s2">" Continuing training from checkpoint, will skip to saved global_step"</span><span class="p">)</span>
<span class="n">logger</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="s2">" Continuing training from epoch </span><span class="si">%d</span><span class="s2">"</span><span class="p">,</span> <span class="n">epochs_trained</span><span class="p">)</span>
<span class="n">logger</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="s2">" Continuing training from global step </span><span class="si">%d</span><span class="s2">"</span><span class="p">,</span> <span class="n">global_step</span><span class="p">)</span>
<span class="n">logger</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="s2">" Will skip the first </span><span class="si">%d</span><span class="s2"> steps in the first epoch"</span><span class="p">,</span> <span class="n">steps_trained_in_current_epoch</span><span class="p">)</span>
<span class="k">except</span> <span class="ne">ValueError</span><span class="p">:</span>
<span class="n">logger</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="s2">" Starting fine-tuning."</span><span class="p">)</span>
<span class="n">tr_loss</span><span class="p">,</span> <span class="n">logging_loss</span> <span class="o">=</span> <span class="mf">0.0</span><span class="p">,</span> <span class="mf">0.0</span>
<span class="n">model</span><span class="o">.</span><span class="n">zero_grad</span><span class="p">()</span>
<span class="n">train_iterator</span> <span class="o">=</span> <span class="n">trange</span><span class="p">(</span>
<span class="n">epochs_trained</span><span class="p">,</span> <span class="nb">int</span><span class="p">(</span><span class="n">args</span><span class="o">.</span><span class="n">num_train_epochs</span><span class="p">),</span> <span class="n">desc</span><span class="o">=</span><span class="s2">"Epoch"</span><span class="p">,</span> <span class="n">disable</span><span class="o">=</span><span class="n">args</span><span class="o">.</span><span class="n">local_rank</span> <span class="ow">not</span> <span class="ow">in</span> <span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">]</span>
<span class="p">)</span>
<span class="n">set_seed</span><span class="p">(</span><span class="n">args</span><span class="p">)</span> <span class="c1"># Added here for reproducibility</span>
<span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="n">train_iterator</span><span class="p">:</span>
<span class="n">epoch_iterator</span> <span class="o">=</span> <span class="n">tqdm</span><span class="p">(</span><span class="n">train_dataloader</span><span class="p">,</span> <span class="n">desc</span><span class="o">=</span><span class="s2">"Iteration"</span><span class="p">,</span> <span class="n">disable</span><span class="o">=</span><span class="n">args</span><span class="o">.</span><span class="n">local_rank</span> <span class="ow">not</span> <span class="ow">in</span> <span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">])</span>
<span class="k">for</span> <span class="n">step</span><span class="p">,</span> <span class="n">batch</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">epoch_iterator</span><span class="p">):</span>
<span class="c1"># Skip past any already trained steps if resuming training</span>
<span class="k">if</span> <span class="n">steps_trained_in_current_epoch</span> <span class="o">></span> <span class="mi">0</span><span class="p">:</span>
<span class="n">steps_trained_in_current_epoch</span> <span class="o">-=</span> <span class="mi">1</span>
<span class="k">continue</span>
<span class="n">inputs</span><span class="p">,</span> <span class="n">labels</span> <span class="o">=</span> <span class="p">(</span><span class="n">batch</span><span class="p">,</span> <span class="n">batch</span><span class="p">)</span>
<span class="k">if</span> <span class="n">inputs</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">></span> <span class="mi">1024</span><span class="p">:</span> <span class="k">continue</span>
<span class="n">inputs</span> <span class="o">=</span> <span class="n">inputs</span><span class="o">.</span><span class="n">to</span><span class="p">(</span><span class="n">args</span><span class="o">.</span><span class="n">device</span><span class="p">)</span>
<span class="n">labels</span> <span class="o">=</span> <span class="n">labels</span><span class="o">.</span><span class="n">to</span><span class="p">(</span><span class="n">args</span><span class="o">.</span><span class="n">device</span><span class="p">)</span>
<span class="n">model</span><span class="o">.</span><span class="n">train</span><span class="p">()</span>
<span class="n">outputs</span> <span class="o">=</span> <span class="n">model</span><span class="p">(</span><span class="n">inputs</span><span class="p">,</span> <span class="n">labels</span><span class="o">=</span><span class="n">labels</span><span class="p">)</span>
<span class="n">loss</span> <span class="o">=</span> <span class="n">outputs</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="c1"># model outputs are always tuple in transformers (see doc)</span>
<span class="k">if</span> <span class="n">args</span><span class="o">.</span><span class="n">n_gpu</span> <span class="o">></span> <span class="mi">1</span><span class="p">:</span>
<span class="n">loss</span> <span class="o">=</span> <span class="n">loss</span><span class="o">.</span><span class="n">mean</span><span class="p">()</span> <span class="c1"># mean() to average on multi-gpu parallel training</span>
<span class="k">if</span> <span class="n">args</span><span class="o">.</span><span class="n">gradient_accumulation_steps</span> <span class="o">></span> <span class="mi">1</span><span class="p">:</span>
<span class="n">loss</span> <span class="o">=</span> <span class="n">loss</span> <span class="o">/</span> <span class="n">args</span><span class="o">.</span><span class="n">gradient_accumulation_steps</span>
<span class="k">if</span> <span class="n">args</span><span class="o">.</span><span class="n">fp16</span><span class="p">:</span>
<span class="k">with</span> <span class="n">amp</span><span class="o">.</span><span class="n">scale_loss</span><span class="p">(</span><span class="n">loss</span><span class="p">,</span> <span class="n">optimizer</span><span class="p">)</span> <span class="k">as</span> <span class="n">scaled_loss</span><span class="p">:</span>
<span class="n">scaled_loss</span><span class="o">.</span><span class="n">backward</span><span class="p">()</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">loss</span><span class="o">.</span><span class="n">backward</span><span class="p">()</span>
<span class="n">tr_loss</span> <span class="o">+=</span> <span class="n">loss</span><span class="o">.</span><span class="n">item</span><span class="p">()</span>
<span class="k">if</span> <span class="p">(</span><span class="n">step</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span> <span class="o">%</span> <span class="n">args</span><span class="o">.</span><span class="n">gradient_accumulation_steps</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
<span class="k">if</span> <span class="n">args</span><span class="o">.</span><span class="n">fp16</span><span class="p">:</span>
<span class="n">torch</span><span class="o">.</span><span class="n">nn</span><span class="o">.</span><span class="n">utils</span><span class="o">.</span><span class="n">clip_grad_norm_</span><span class="p">(</span><span class="n">amp</span><span class="o">.</span><span class="n">master_params</span><span class="p">(</span><span class="n">optimizer</span><span class="p">),</span> <span class="n">args</span><span class="o">.</span><span class="n">max_grad_norm</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">torch</span><span class="o">.</span><span class="n">nn</span><span class="o">.</span><span class="n">utils</span><span class="o">.</span><span class="n">clip_grad_norm_</span><span class="p">(</span><span class="n">model</span><span class="o">.</span><span class="n">parameters</span><span class="p">(),</span> <span class="n">args</span><span class="o">.</span><span class="n">max_grad_norm</span><span class="p">)</span>
<span class="n">optimizer</span><span class="o">.</span><span class="n">step</span><span class="p">()</span>
<span class="n">scheduler</span><span class="o">.</span><span class="n">step</span><span class="p">()</span> <span class="c1"># Update learning rate schedule</span>
<span class="n">model</span><span class="o">.</span><span class="n">zero_grad</span><span class="p">()</span>
<span class="n">global_step</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="k">if</span> <span class="n">args</span><span class="o">.</span><span class="n">local_rank</span> <span class="ow">in</span> <span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">]</span> <span class="ow">and</span> <span class="n">args</span><span class="o">.</span><span class="n">logging_steps</span> <span class="o">></span> <span class="mi">0</span> <span class="ow">and</span> <span class="n">global_step</span> <span class="o">%</span> <span class="n">args</span><span class="o">.</span><span class="n">logging_steps</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
<span class="c1"># Log metrics</span>
<span class="k">if</span> <span class="p">(</span>
<span class="n">args</span><span class="o">.</span><span class="n">local_rank</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span> <span class="ow">and</span> <span class="n">args</span><span class="o">.</span><span class="n">evaluate_during_training</span>
<span class="p">):</span> <span class="c1"># Only evaluate when single GPU otherwise metrics may not average well</span>
<span class="n">results</span> <span class="o">=</span> <span class="n">evaluate</span><span class="p">(</span><span class="n">args</span><span class="p">,</span> <span class="n">model</span><span class="p">,</span> <span class="n">tokenizer</span><span class="p">)</span>
<span class="k">for</span> <span class="n">key</span><span class="p">,</span> <span class="n">value</span> <span class="ow">in</span> <span class="n">results</span><span class="o">.</span><span class="n">items</span><span class="p">():</span>
<span class="n">tb_writer</span><span class="o">.</span><span class="n">add_scalar</span><span class="p">(</span><span class="s2">"eval_</span><span class="si">{}</span><span class="s2">"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">key</span><span class="p">),</span> <span class="n">value</span><span class="p">,</span> <span class="n">global_step</span><span class="p">)</span>
<span class="n">tb_writer</span><span class="o">.</span><span class="n">add_scalar</span><span class="p">(</span><span class="s2">"lr"</span><span class="p">,</span> <span class="n">scheduler</span><span class="o">.</span><span class="n">get_lr</span><span class="p">()[</span><span class="mi">0</span><span class="p">],</span> <span class="n">global_step</span><span class="p">)</span>
<span class="n">tb_writer</span><span class="o">.</span><span class="n">add_scalar</span><span class="p">(</span><span class="s2">"loss"</span><span class="p">,</span> <span class="p">(</span><span class="n">tr_loss</span> <span class="o">-</span> <span class="n">logging_loss</span><span class="p">)</span> <span class="o">/</span> <span class="n">args</span><span class="o">.</span><span class="n">logging_steps</span><span class="p">,</span> <span class="n">global_step</span><span class="p">)</span>
<span class="n">logging_loss</span> <span class="o">=</span> <span class="n">tr_loss</span>
<span class="k">if</span> <span class="n">args</span><span class="o">.</span><span class="n">local_rank</span> <span class="ow">in</span> <span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">]</span> <span class="ow">and</span> <span class="n">args</span><span class="o">.</span><span class="n">save_steps</span> <span class="o">></span> <span class="mi">0</span> <span class="ow">and</span> <span class="n">global_step</span> <span class="o">%</span> <span class="n">args</span><span class="o">.</span><span class="n">save_steps</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
<span class="n">checkpoint_prefix</span> <span class="o">=</span> <span class="s2">"checkpoint"</span>
<span class="c1"># Save model checkpoint</span>
<span class="n">output_dir</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">args</span><span class="o">.</span><span class="n">output_dir</span><span class="p">,</span> <span class="s2">"</span><span class="si">{}</span><span class="s2">-</span><span class="si">{}</span><span class="s2">"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">checkpoint_prefix</span><span class="p">,</span> <span class="n">global_step</span><span class="p">))</span>
<span class="n">os</span><span class="o">.</span><span class="n">makedirs</span><span class="p">(</span><span class="n">output_dir</span><span class="p">,</span> <span class="n">exist_ok</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="n">model_to_save</span> <span class="o">=</span> <span class="p">(</span>
<span class="n">model</span><span class="o">.</span><span class="n">module</span> <span class="k">if</span> <span class="nb">hasattr</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="s2">"module"</span><span class="p">)</span> <span class="k">else</span> <span class="n">model</span>
<span class="p">)</span> <span class="c1"># Take care of distributed/parallel training</span>
<span class="n">model_to_save</span><span class="o">.</span><span class="n">save_pretrained</span><span class="p">(</span><span class="n">output_dir</span><span class="p">)</span>
<span class="n">tokenizer</span><span class="o">.</span><span class="n">save_pretrained</span><span class="p">(</span><span class="n">output_dir</span><span class="p">)</span>
<span class="n">torch</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">args</span><span class="p">,</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">output_dir</span><span class="p">,</span> <span class="s2">"training_args.bin"</span><span class="p">))</span>
<span class="n">logger</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="s2">"Saving model checkpoint to </span><span class="si">%s</span><span class="s2">"</span><span class="p">,</span> <span class="n">output_dir</span><span class="p">)</span>
<span class="n">_rotate_checkpoints</span><span class="p">(</span><span class="n">args</span><span class="p">,</span> <span class="n">checkpoint_prefix</span><span class="p">)</span>
<span class="n">torch</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">optimizer</span><span class="o">.</span><span class="n">state_dict</span><span class="p">(),</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">output_dir</span><span class="p">,</span> <span class="s2">"optimizer.pt"</span><span class="p">))</span>
<span class="n">torch</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">scheduler</span><span class="o">.</span><span class="n">state_dict</span><span class="p">(),</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">output_dir</span><span class="p">,</span> <span class="s2">"scheduler.pt"</span><span class="p">))</span>
<span class="n">logger</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="s2">"Saving optimizer and scheduler states to </span><span class="si">%s</span><span class="s2">"</span><span class="p">,</span> <span class="n">output_dir</span><span class="p">)</span>
<span class="k">if</span> <span class="n">args</span><span class="o">.</span><span class="n">max_steps</span> <span class="o">></span> <span class="mi">0</span> <span class="ow">and</span> <span class="n">global_step</span> <span class="o">></span> <span class="n">args</span><span class="o">.</span><span class="n">max_steps</span><span class="p">:</span>
<span class="n">epoch_iterator</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>
<span class="k">break</span>
<span class="k">if</span> <span class="n">args</span><span class="o">.</span><span class="n">max_steps</span> <span class="o">></span> <span class="mi">0</span> <span class="ow">and</span> <span class="n">global_step</span> <span class="o">></span> <span class="n">args</span><span class="o">.</span><span class="n">max_steps</span><span class="p">:</span>
<span class="n">train_iterator</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>
<span class="k">break</span>
<span class="k">if</span> <span class="n">args</span><span class="o">.</span><span class="n">local_rank</span> <span class="ow">in</span> <span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">]:</span>
<span class="n">tb_writer</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>
<span class="k">return</span> <span class="n">global_step</span><span class="p">,</span> <span class="n">tr_loss</span> <span class="o">/</span> <span class="n">global_step</span>
<span class="c1"># Evaluation of some model</span>
<span class="k">def</span> <span class="nf">evaluate</span><span class="p">(</span><span class="n">args</span><span class="p">,</span> <span class="n">model</span><span class="p">:</span> <span class="n">PreTrainedModel</span><span class="p">,</span> <span class="n">tokenizer</span><span class="p">:</span> <span class="n">PreTrainedTokenizer</span><span class="p">,</span> <span class="n">df_trn</span><span class="p">,</span> <span class="n">df_val</span><span class="p">,</span> <span class="n">prefix</span><span class="o">=</span><span class="s2">""</span><span class="p">)</span> <span class="o">-></span> <span class="n">Dict</span><span class="p">:</span>
<span class="c1"># Loop to handle MNLI double evaluation (matched, mis-matched)</span>
<span class="n">eval_output_dir</span> <span class="o">=</span> <span class="n">args</span><span class="o">.</span><span class="n">output_dir</span>
<span class="n">eval_dataset</span> <span class="o">=</span> <span class="n">load_and_cache_examples</span><span class="p">(</span><span class="n">args</span><span class="p">,</span> <span class="n">tokenizer</span><span class="p">,</span> <span class="n">df_trn</span><span class="p">,</span> <span class="n">df_val</span><span class="p">,</span> <span class="n">evaluate</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="n">os</span><span class="o">.</span><span class="n">makedirs</span><span class="p">(</span><span class="n">eval_output_dir</span><span class="p">,</span> <span class="n">exist_ok</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="n">args</span><span class="o">.</span><span class="n">eval_batch_size</span> <span class="o">=</span> <span class="n">args</span><span class="o">.</span><span class="n">per_gpu_eval_batch_size</span> <span class="o">*</span> <span class="nb">max</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">args</span><span class="o">.</span><span class="n">n_gpu</span><span class="p">)</span>
<span class="c1"># Note that DistributedSampler samples randomly</span>
<span class="k">def</span> <span class="nf">collate</span><span class="p">(</span><span class="n">examples</span><span class="p">:</span> <span class="n">List</span><span class="p">[</span><span class="n">torch</span><span class="o">.</span><span class="n">Tensor</span><span class="p">]):</span>
<span class="k">if</span> <span class="n">tokenizer</span><span class="o">.</span><span class="n">_pad_token</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
<span class="k">return</span> <span class="n">pad_sequence</span><span class="p">(</span><span class="n">examples</span><span class="p">,</span> <span class="n">batch_first</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="k">return</span> <span class="n">pad_sequence</span><span class="p">(</span><span class="n">examples</span><span class="p">,</span> <span class="n">batch_first</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">padding_value</span><span class="o">=</span><span class="n">tokenizer</span><span class="o">.</span><span class="n">pad_token_id</span><span class="p">)</span>
<span class="n">eval_sampler</span> <span class="o">=</span> <span class="n">SequentialSampler</span><span class="p">(</span><span class="n">eval_dataset</span><span class="p">)</span>
<span class="n">eval_dataloader</span> <span class="o">=</span> <span class="n">DataLoader</span><span class="p">(</span>
<span class="n">eval_dataset</span><span class="p">,</span> <span class="n">sampler</span><span class="o">=</span><span class="n">eval_sampler</span><span class="p">,</span> <span class="n">batch_size</span><span class="o">=</span><span class="n">args</span><span class="o">.</span><span class="n">eval_batch_size</span><span class="p">,</span> <span class="n">collate_fn</span><span class="o">=</span><span class="n">collate</span><span class="p">,</span> <span class="n">drop_last</span> <span class="o">=</span> <span class="kc">True</span>
<span class="p">)</span>
<span class="c1"># multi-gpu evaluate</span>
<span class="k">if</span> <span class="n">args</span><span class="o">.</span><span class="n">n_gpu</span> <span class="o">></span> <span class="mi">1</span><span class="p">:</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">nn</span><span class="o">.</span><span class="n">DataParallel</span><span class="p">(</span><span class="n">model</span><span class="p">)</span>
<span class="c1"># Eval!</span>
<span class="n">logger</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="s2">"***** Running evaluation </span><span class="si">{}</span><span class="s2"> *****"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">prefix</span><span class="p">))</span>
<span class="n">logger</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="s2">" Num examples = </span><span class="si">%d</span><span class="s2">"</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">eval_dataset</span><span class="p">))</span>
<span class="n">logger</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="s2">" Batch size = </span><span class="si">%d</span><span class="s2">"</span><span class="p">,</span> <span class="n">args</span><span class="o">.</span><span class="n">eval_batch_size</span><span class="p">)</span>
<span class="n">eval_loss</span> <span class="o">=</span> <span class="mf">0.0</span>
<span class="n">nb_eval_steps</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">model</span><span class="o">.</span><span class="n">eval</span><span class="p">()</span>
<span class="k">for</span> <span class="n">batch</span> <span class="ow">in</span> <span class="n">tqdm</span><span class="p">(</span><span class="n">eval_dataloader</span><span class="p">,</span> <span class="n">desc</span><span class="o">=</span><span class="s2">"Evaluating"</span><span class="p">):</span>
<span class="n">inputs</span><span class="p">,</span> <span class="n">labels</span> <span class="o">=</span> <span class="p">(</span><span class="n">batch</span><span class="p">,</span> <span class="n">batch</span><span class="p">)</span>
<span class="n">inputs</span> <span class="o">=</span> <span class="n">inputs</span><span class="o">.</span><span class="n">to</span><span class="p">(</span><span class="n">args</span><span class="o">.</span><span class="n">device</span><span class="p">)</span>
<span class="n">labels</span> <span class="o">=</span> <span class="n">labels</span><span class="o">.</span><span class="n">to</span><span class="p">(</span><span class="n">args</span><span class="o">.</span><span class="n">device</span><span class="p">)</span>
<span class="k">with</span> <span class="n">torch</span><span class="o">.</span><span class="n">no_grad</span><span class="p">():</span>
<span class="n">outputs</span> <span class="o">=</span> <span class="n">model</span><span class="p">(</span><span class="n">inputs</span><span class="p">,</span> <span class="n">labels</span><span class="o">=</span><span class="n">labels</span><span class="p">)</span>
<span class="n">lm_loss</span> <span class="o">=</span> <span class="n">outputs</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="n">eval_loss</span> <span class="o">+=</span> <span class="n">lm_loss</span><span class="o">.</span><span class="n">mean</span><span class="p">()</span><span class="o">.</span><span class="n">item</span><span class="p">()</span>
<span class="n">nb_eval_steps</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="n">eval_loss</span> <span class="o">=</span> <span class="n">eval_loss</span> <span class="o">/</span> <span class="n">nb_eval_steps</span>
<span class="n">perplexity</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">exp</span><span class="p">(</span><span class="n">torch</span><span class="o">.</span><span class="n">tensor</span><span class="p">(</span><span class="n">eval_loss</span><span class="p">))</span>
<span class="n">result</span> <span class="o">=</span> <span class="p">{</span><span class="s2">"perplexity"</span><span class="p">:</span> <span class="n">perplexity</span><span class="p">}</span>
<span class="n">output_eval_file</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">eval_output_dir</span><span class="p">,</span> <span class="n">prefix</span><span class="p">,</span> <span class="s2">"eval_results.txt"</span><span class="p">)</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">output_eval_file</span><span class="p">,</span> <span class="s2">"w"</span><span class="p">)</span> <span class="k">as</span> <span class="n">writer</span><span class="p">:</span>
<span class="n">logger</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="s2">"***** Eval results </span><span class="si">{}</span><span class="s2"> *****"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">prefix</span><span class="p">))</span>
<span class="k">for</span> <span class="n">key</span> <span class="ow">in</span> <span class="nb">sorted</span><span class="p">(</span><span class="n">result</span><span class="o">.</span><span class="n">keys</span><span class="p">()):</span>
<span class="n">logger</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="s2">" </span><span class="si">%s</span><span class="s2"> = </span><span class="si">%s</span><span class="s2">"</span><span class="p">,</span> <span class="n">key</span><span class="p">,</span> <span class="nb">str</span><span class="p">(</span><span class="n">result</span><span class="p">[</span><span class="n">key</span><span class="p">]))</span>
<span class="n">writer</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="s2">"</span><span class="si">%s</span><span class="s2"> = </span><span class="si">%s</span><span class="se">\n</span><span class="s2">"</span> <span class="o">%</span> <span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="nb">str</span><span class="p">(</span><span class="n">result</span><span class="p">[</span><span class="n">key</span><span class="p">])))</span>
<span class="k">return</span> <span class="n">result</span>
</pre></div>
</div>
</div>
</div>
</p>
</details>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>Now let's put it all together into our runner function and let our baby cook away!</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<details class="description">
<summary class="btn btn-sm" data-open="Hide Code" data-close="Show Code"></summary>
<p><div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="c1"># Main show runner</span>
<span class="k">def</span> <span class="nf">main</span><span class="p">(</span><span class="n">df_trn</span><span class="p">,</span> <span class="n">df_val</span><span class="p">):</span>
<span class="n">args</span> <span class="o">=</span> <span class="n">Args</span><span class="p">()</span>
<span class="k">if</span> <span class="n">args</span><span class="o">.</span><span class="n">should_continue</span><span class="p">:</span>
<span class="n">sorted_checkpoints</span> <span class="o">=</span> <span class="n">_sorted_checkpoints</span><span class="p">(</span><span class="n">args</span><span class="p">)</span>
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">sorted_checkpoints</span><span class="p">)</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="s2">"Used --should_continue but no checkpoint was found in --output_dir."</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">args</span><span class="o">.</span><span class="n">model_name_or_path</span> <span class="o">=</span> <span class="n">sorted_checkpoints</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
<span class="k">if</span> <span class="p">(</span>
<span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">exists</span><span class="p">(</span><span class="n">args</span><span class="o">.</span><span class="n">output_dir</span><span class="p">)</span>
<span class="ow">and</span> <span class="n">os</span><span class="o">.</span><span class="n">listdir</span><span class="p">(</span><span class="n">args</span><span class="o">.</span><span class="n">output_dir</span><span class="p">)</span>
<span class="ow">and</span> <span class="n">args</span><span class="o">.</span><span class="n">do_train</span>
<span class="ow">and</span> <span class="ow">not</span> <span class="n">args</span><span class="o">.</span><span class="n">overwrite_output_dir</span>
<span class="ow">and</span> <span class="ow">not</span> <span class="n">args</span><span class="o">.</span><span class="n">should_continue</span>
<span class="p">):</span>
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span>
<span class="s2">"Output directory (</span><span class="si">{}</span><span class="s2">) already exists and is not empty. Use --overwrite_output_dir to overcome."</span><span class="o">.</span><span class="n">format</span><span class="p">(</span>
<span class="n">args</span><span class="o">.</span><span class="n">output_dir</span>
<span class="p">)</span>
<span class="p">)</span>
<span class="c1"># Setup CUDA, GPU & distributed training</span>
<span class="n">device</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">device</span><span class="p">(</span><span class="s2">"cuda"</span><span class="p">)</span>
<span class="n">args</span><span class="o">.</span><span class="n">n_gpu</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">cuda</span><span class="o">.</span><span class="n">device_count</span><span class="p">()</span>
<span class="n">args</span><span class="o">.</span><span class="n">device</span> <span class="o">=</span> <span class="n">device</span>
<span class="c1"># Setup logging</span>
<span class="n">logging</span><span class="o">.</span><span class="n">basicConfig</span><span class="p">(</span>
<span class="nb">format</span><span class="o">=</span><span class="s2">"</span><span class="si">%(asctime)s</span><span class="s2"> - </span><span class="si">%(levelname)s</span><span class="s2"> - </span><span class="si">%(name)s</span><span class="s2"> - </span><span class="si">%(message)s</span><span class="s2">"</span><span class="p">,</span>
<span class="n">datefmt</span><span class="o">=</span><span class="s2">"%m/</span><span class="si">%d</span><span class="s2">/%Y %H:%M:%S"</span><span class="p">,</span>
<span class="n">level</span><span class="o">=</span><span class="n">logging</span><span class="o">.</span><span class="n">INFO</span> <span class="k">if</span> <span class="n">args</span><span class="o">.</span><span class="n">local_rank</span> <span class="ow">in</span> <span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">]</span> <span class="k">else</span> <span class="n">logging</span><span class="o">.</span><span class="n">WARN</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">logger</span><span class="o">.</span><span class="n">warning</span><span class="p">(</span>
<span class="s2">"Process rank: </span><span class="si">%s</span><span class="s2">, device: </span><span class="si">%s</span><span class="s2">, n_gpu: </span><span class="si">%s</span><span class="s2">, distributed training: </span><span class="si">%s</span><span class="s2">, 16-bits training: </span><span class="si">%s</span><span class="s2">"</span><span class="p">,</span>
<span class="n">args</span><span class="o">.</span><span class="n">local_rank</span><span class="p">,</span>
<span class="n">device</span><span class="p">,</span>
<span class="n">args</span><span class="o">.</span><span class="n">n_gpu</span><span class="p">,</span>
<span class="nb">bool</span><span class="p">(</span><span class="n">args</span><span class="o">.</span><span class="n">local_rank</span> <span class="o">!=</span> <span class="o">-</span><span class="mi">1</span><span class="p">),</span>
<span class="n">args</span><span class="o">.</span><span class="n">fp16</span><span class="p">,</span>
<span class="p">)</span>
<span class="c1"># Set seed</span>
<span class="n">set_seed</span><span class="p">(</span><span class="n">args</span><span class="p">)</span>
<span class="n">config</span> <span class="o">=</span> <span class="n">AutoConfig</span><span class="o">.</span><span class="n">from_pretrained</span><span class="p">(</span><span class="n">args</span><span class="o">.</span><span class="n">config_name</span><span class="p">,</span> <span class="n">cache_dir</span><span class="o">=</span><span class="n">args</span><span class="o">.</span><span class="n">cache_dir</span><span class="p">)</span>
<span class="n">tokenizer</span> <span class="o">=</span> <span class="n">AutoTokenizer</span><span class="o">.</span><span class="n">from_pretrained</span><span class="p">(</span><span class="n">args</span><span class="o">.</span><span class="n">tokenizer_name</span><span class="p">,</span> <span class="n">cache_dir</span><span class="o">=</span><span class="n">args</span><span class="o">.</span><span class="n">cache_dir</span><span class="p">)</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">AutoModelWithLMHead</span><span class="o">.</span><span class="n">from_pretrained</span><span class="p">(</span>
<span class="n">args</span><span class="o">.</span><span class="n">model_name_or_path</span><span class="p">,</span>
<span class="n">from_tf</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span>
<span class="n">config</span><span class="o">=</span><span class="n">config</span><span class="p">,</span>
<span class="n">cache_dir</span><span class="o">=</span><span class="n">args</span><span class="o">.</span><span class="n">cache_dir</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">model</span><span class="o">.</span><span class="n">to</span><span class="p">(</span><span class="n">args</span><span class="o">.</span><span class="n">device</span><span class="p">)</span>
<span class="n">logger</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="s2">"Training/evaluation parameters </span><span class="si">%s</span><span class="s2">"</span><span class="p">,</span> <span class="n">args</span><span class="p">)</span>
<span class="c1"># Training</span>
<span class="k">if</span> <span class="n">args</span><span class="o">.</span><span class="n">do_train</span><span class="p">:</span>
<span class="n">train_dataset</span> <span class="o">=</span> <span class="n">load_and_cache_examples</span><span class="p">(</span><span class="n">args</span><span class="p">,</span> <span class="n">tokenizer</span><span class="p">,</span> <span class="n">df_trn</span><span class="p">,</span> <span class="n">df_val</span><span class="p">,</span> <span class="n">evaluate</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span>
<span class="n">global_step</span><span class="p">,</span> <span class="n">tr_loss</span> <span class="o">=</span> <span class="n">train</span><span class="p">(</span><span class="n">args</span><span class="p">,</span> <span class="n">train_dataset</span><span class="p">,</span> <span class="n">model</span><span class="p">,</span> <span class="n">tokenizer</span><span class="p">)</span>
<span class="n">logger</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="s2">" global_step = </span><span class="si">%s</span><span class="s2">, average loss = </span><span class="si">%s</span><span class="s2">"</span><span class="p">,</span> <span class="n">global_step</span><span class="p">,</span> <span class="n">tr_loss</span><span class="p">)</span>
<span class="c1"># Saving best-practices: if you use save_pretrained for the model and tokenizer, you can reload them using from_pretrained()</span>
<span class="k">if</span> <span class="n">args</span><span class="o">.</span><span class="n">do_train</span><span class="p">:</span>
<span class="c1"># Create output directory if needed</span>
<span class="n">os</span><span class="o">.</span><span class="n">makedirs</span><span class="p">(</span><span class="n">args</span><span class="o">.</span><span class="n">output_dir</span><span class="p">,</span> <span class="n">exist_ok</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="n">logger</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="s2">"Saving model checkpoint to </span><span class="si">%s</span><span class="s2">"</span><span class="p">,</span> <span class="n">args</span><span class="o">.</span><span class="n">output_dir</span><span class="p">)</span>
<span class="c1"># Save a trained model, configuration and tokenizer using `save_pretrained()`.</span>
<span class="c1"># They can then be reloaded using `from_pretrained()`</span>
<span class="n">model_to_save</span> <span class="o">=</span> <span class="p">(</span>
<span class="n">model</span><span class="o">.</span><span class="n">module</span> <span class="k">if</span> <span class="nb">hasattr</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="s2">"module"</span><span class="p">)</span> <span class="k">else</span> <span class="n">model</span>
<span class="p">)</span> <span class="c1"># Take care of distributed/parallel training</span>
<span class="n">model_to_save</span><span class="o">.</span><span class="n">save_pretrained</span><span class="p">(</span><span class="n">args</span><span class="o">.</span><span class="n">output_dir</span><span class="p">)</span>
<span class="n">tokenizer</span><span class="o">.</span><span class="n">save_pretrained</span><span class="p">(</span><span class="n">args</span><span class="o">.</span><span class="n">output_dir</span><span class="p">)</span>
<span class="c1"># Good practice: save your training arguments together with the trained model</span>
<span class="n">torch</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">args</span><span class="p">,</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">args</span><span class="o">.</span><span class="n">output_dir</span><span class="p">,</span> <span class="s2">"training_args.bin"</span><span class="p">))</span>
<span class="c1"># Load a trained model and vocabulary that you have fine-tuned</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">AutoModelWithLMHead</span><span class="o">.</span><span class="n">from_pretrained</span><span class="p">(</span><span class="n">args</span><span class="o">.</span><span class="n">output_dir</span><span class="p">)</span>
<span class="n">tokenizer</span> <span class="o">=</span> <span class="n">AutoTokenizer</span><span class="o">.</span><span class="n">from_pretrained</span><span class="p">(</span><span class="n">args</span><span class="o">.</span><span class="n">output_dir</span><span class="p">)</span>
<span class="n">model</span><span class="o">.</span><span class="n">to</span><span class="p">(</span><span class="n">args</span><span class="o">.</span><span class="n">device</span><span class="p">)</span>
<span class="c1"># Evaluation</span>
<span class="n">results</span> <span class="o">=</span> <span class="p">{}</span>
<span class="k">if</span> <span class="n">args</span><span class="o">.</span><span class="n">do_eval</span> <span class="ow">and</span> <span class="n">args</span><span class="o">.</span><span class="n">local_rank</span> <span class="ow">in</span> <span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">]:</span>
<span class="n">checkpoints</span> <span class="o">=</span> <span class="p">[</span><span class="n">args</span><span class="o">.</span><span class="n">output_dir</span><span class="p">]</span>
<span class="k">if</span> <span class="n">args</span><span class="o">.</span><span class="n">eval_all_checkpoints</span><span class="p">:</span>
<span class="n">checkpoints</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span>
<span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">dirname</span><span class="p">(</span><span class="n">c</span><span class="p">)</span> <span class="k">for</span> <span class="n">c</span> <span class="ow">in</span> <span class="nb">sorted</span><span class="p">(</span><span class="n">glob</span><span class="o">.</span><span class="n">glob</span><span class="p">(</span><span class="n">args</span><span class="o">.</span><span class="n">output_dir</span> <span class="o">+</span> <span class="s2">"/**/"</span> <span class="o">+</span> <span class="n">WEIGHTS_NAME</span><span class="p">,</span> <span class="n">recursive</span><span class="o">=</span><span class="kc">True</span><span class="p">))</span>
<span class="p">)</span>
<span class="n">logging</span><span class="o">.</span><span class="n">getLogger</span><span class="p">(</span><span class="s2">"transformers.modeling_utils"</span><span class="p">)</span><span class="o">.</span><span class="n">setLevel</span><span class="p">(</span><span class="n">logging</span><span class="o">.</span><span class="n">WARN</span><span class="p">)</span> <span class="c1"># Reduce logging</span>
<span class="n">logger</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="s2">"Evaluate the following checkpoints: </span><span class="si">%s</span><span class="s2">"</span><span class="p">,</span> <span class="n">checkpoints</span><span class="p">)</span>
<span class="k">for</span> <span class="n">checkpoint</span> <span class="ow">in</span> <span class="n">checkpoints</span><span class="p">:</span>
<span class="n">global_step</span> <span class="o">=</span> <span class="n">checkpoint</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s2">"-"</span><span class="p">)[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">checkpoints</span><span class="p">)</span> <span class="o">></span> <span class="mi">1</span> <span class="k">else</span> <span class="s2">""</span>
<span class="n">prefix</span> <span class="o">=</span> <span class="n">checkpoint</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s2">"/"</span><span class="p">)[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="k">if</span> <span class="n">checkpoint</span><span class="o">.</span><span class="n">find</span><span class="p">(</span><span class="s2">"checkpoint"</span><span class="p">)</span> <span class="o">!=</span> <span class="o">-</span><span class="mi">1</span> <span class="k">else</span> <span class="s2">""</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">AutoModelWithLMHead</span><span class="o">.</span><span class="n">from_pretrained</span><span class="p">(</span><span class="n">checkpoint</span><span class="p">)</span>
<span class="n">model</span><span class="o">.</span><span class="n">to</span><span class="p">(</span><span class="n">args</span><span class="o">.</span><span class="n">device</span><span class="p">)</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">evaluate</span><span class="p">(</span><span class="n">args</span><span class="p">,</span> <span class="n">model</span><span class="p">,</span> <span class="n">tokenizer</span><span class="p">,</span> <span class="n">df_trn</span><span class="p">,</span> <span class="n">df_val</span><span class="p">,</span> <span class="n">prefix</span><span class="o">=</span><span class="n">prefix</span><span class="p">)</span>
<span class="n">result</span> <span class="o">=</span> <span class="nb">dict</span><span class="p">((</span><span class="n">k</span> <span class="o">+</span> <span class="s2">"_</span><span class="si">{}</span><span class="s2">"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">global_step</span><span class="p">),</span> <span class="n">v</span><span class="p">)</span> <span class="k">for</span> <span class="n">k</span><span class="p">,</span> <span class="n">v</span> <span class="ow">in</span> <span class="n">result</span><span class="o">.</span><span class="n">items</span><span class="p">())</span>
<span class="n">results</span><span class="o">.</span><span class="n">update</span><span class="p">(</span><span class="n">result</span><span class="p">)</span>
<span class="k">return</span> <span class="n">results</span>
</pre></div>
</div>
</div>
</div>
</p>
</details>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="o">%</span><span class="k">load_ext</span> tensorboard
<span class="o">%</span><span class="k">tensorboard</span> --logdir runs
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div id="b8751277-b0dc-4b41-b8c2-c1e4c4da9141"></div>
<div class="output_subarea output_javascript ">
<script type="text/javascript">
var element = $('#b8751277-b0dc-4b41-b8c2-c1e4c4da9141');
(async () => {
const url = await google.colab.kernel.proxyPort(6006, {"cache": true});
const iframe = document.createElement('iframe');
iframe.src = url;
iframe.setAttribute('width', '100%');
iframe.setAttribute('height', '800');
iframe.setAttribute('frameborder', 0);
document.body.appendChild(iframe);
})();
</script>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>Finally, we run our model! I found this can take anywhere from an hour to three hours depending on the GPU Google give to you to finish training a model that can sort of hold a coherent conversation for the Spanish language. If you are using a different language, you'll have to play around with how long to cook your model for.</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">main</span><span class="p">(</span><span class="n">trn_df</span><span class="p">,</span> <span class="n">val_df</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h1 id="Chatting-with-our-Model">Chatting with our Model<a class="anchor-link" href="#Chatting-with-our-Model"> </a></h1><p>Now that we have our model trained, let's it out for a spin and have our first conversation with it!</p>
<p>In order to allow us to chitchat with our new bot we need to figure out when the model has finished its turn, i.e. when it has generated the [end_of_turn] token. When the model generates this token, we can switch back control of the conversation to the user so they can respond. Luckily, this is very easy to do with the Huggingface framework!</p>
<p>The below code is copied pretty much verbatim from the creators of the DialoGPT model, which you can find <a href="https://huggingface.co/microsoft/DialoGPT-small">here</a>.</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">tokenizer</span> <span class="o">=</span> <span class="n">AutoTokenizer</span><span class="o">.</span><span class="n">from_pretrained</span><span class="p">(</span><span class="s1">'microsoft/DialoGPT-small'</span><span class="p">)</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">AutoModelWithLMHead</span><span class="o">.</span><span class="n">from_pretrained</span><span class="p">(</span><span class="s1">'output'</span><span class="p">)</span>
<span class="c1"># Let's chat for 5 lines</span>
<span class="k">for</span> <span class="n">step</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">6</span><span class="p">):</span>
<span class="c1"># encode the new user input, add the eos_token and return a tensor in Pytorch</span>
<span class="n">new_user_input_ids</span> <span class="o">=</span> <span class="n">tokenizer</span><span class="o">.</span><span class="n">encode</span><span class="p">(</span><span class="nb">input</span><span class="p">(</span><span class="s2">">> User:"</span><span class="p">)</span> <span class="o">+</span> <span class="n">tokenizer</span><span class="o">.</span><span class="n">eos_token</span><span class="p">,</span> <span class="n">return_tensors</span><span class="o">=</span><span class="s1">'pt'</span><span class="p">)</span>
<span class="c1"># print(new_user_input_ids)</span>
<span class="c1"># append the new user input tokens to the chat history</span>
<span class="n">bot_input_ids</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">cat</span><span class="p">([</span><span class="n">chat_history_ids</span><span class="p">,</span> <span class="n">new_user_input_ids</span><span class="p">],</span> <span class="n">dim</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span> <span class="k">if</span> <span class="n">step</span> <span class="o">></span> <span class="mi">0</span> <span class="k">else</span> <span class="n">new_user_input_ids</span>
<span class="c1"># generated a response while limiting the total chat history to 1000 tokens, </span>
<span class="n">chat_history_ids</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">generate</span><span class="p">(</span>
<span class="n">bot_input_ids</span><span class="p">,</span> <span class="n">max_length</span><span class="o">=</span><span class="mi">1000</span><span class="p">,</span>
<span class="n">pad_token_id</span><span class="o">=</span><span class="n">tokenizer</span><span class="o">.</span><span class="n">eos_token_id</span><span class="p">,</span>
<span class="n">top_p</span><span class="o">=</span><span class="mf">0.92</span><span class="p">,</span> <span class="n">top_k</span> <span class="o">=</span> <span class="mi">50</span>
<span class="p">)</span>
<span class="c1"># pretty print last ouput tokens from bot</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"DialoGPT: </span><span class="si">{}</span><span class="s2">"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">tokenizer</span><span class="o">.</span><span class="n">decode</span><span class="p">(</span><span class="n">chat_history_ids</span><span class="p">[:,</span> <span class="n">bot_input_ids</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]:][</span><span class="mi">0</span><span class="p">],</span> <span class="n">skip_special_tokens</span><span class="o">=</span><span class="kc">True</span><span class="p">)))</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_subarea output_stream output_stderr output_text">
<pre>05/13/2020 00:27:10 - INFO - filelock - Lock 139706162979168 acquired on /root/.cache/torch/transformers/c3a09526c725b854c685b72cf60c50f1fea9b0e4d6227fa41573425ef4bd4bc6.4c1d7fc2ac6ddabeaf0c8bec2ffc7dc112f668f5871a06efcff113d2797ec7d5.lock
05/13/2020 00:27:10 - INFO - transformers.file_utils - https://s3.amazonaws.com/models.huggingface.co/bert/microsoft/DialoGPT-small/config.json not found in cache or force_download set to True, downloading to /root/.cache/torch/transformers/tmpkhif9g52
05/13/2020 00:27:10 - INFO - transformers.file_utils - storing https://s3.amazonaws.com/models.huggingface.co/bert/microsoft/DialoGPT-small/config.json in cache at /root/.cache/torch/transformers/c3a09526c725b854c685b72cf60c50f1fea9b0e4d6227fa41573425ef4bd4bc6.4c1d7fc2ac6ddabeaf0c8bec2ffc7dc112f668f5871a06efcff113d2797ec7d5
05/13/2020 00:27:10 - INFO - transformers.file_utils - creating metadata file for /root/.cache/torch/transformers/c3a09526c725b854c685b72cf60c50f1fea9b0e4d6227fa41573425ef4bd4bc6.4c1d7fc2ac6ddabeaf0c8bec2ffc7dc112f668f5871a06efcff113d2797ec7d5
05/13/2020 00:27:10 - INFO - filelock - Lock 139706162979168 released on /root/.cache/torch/transformers/c3a09526c725b854c685b72cf60c50f1fea9b0e4d6227fa41573425ef4bd4bc6.4c1d7fc2ac6ddabeaf0c8bec2ffc7dc112f668f5871a06efcff113d2797ec7d5.lock
05/13/2020 00:27:10 - INFO - transformers.configuration_utils - loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/microsoft/DialoGPT-small/config.json from cache at /root/.cache/torch/transformers/c3a09526c725b854c685b72cf60c50f1fea9b0e4d6227fa41573425ef4bd4bc6.4c1d7fc2ac6ddabeaf0c8bec2ffc7dc112f668f5871a06efcff113d2797ec7d5
05/13/2020 00:27:10 - INFO - transformers.configuration_utils - Model config GPT2Config {
"activation_function": "gelu_new",
"architectures": [
"GPT2LMHeadModel"
],
"attn_pdrop": 0.1,
"bos_token_id": 50256,
"embd_pdrop": 0.1,
"eos_token_id": 50256,
"initializer_range": 0.02,
"layer_norm_epsilon": 1e-05,
"model_type": "gpt2",
"n_ctx": 1024,
"n_embd": 768,
"n_head": 12,
"n_layer": 12,
"n_positions": 1024,
"resid_pdrop": 0.1,
"summary_activation": null,
"summary_first_dropout": 0.1,
"summary_proj_to_labels": true,
"summary_type": "cls_index",
"summary_use_proj": true,
"vocab_size": 50257
}
05/13/2020 00:27:10 - INFO - transformers.tokenization_utils - Model name 'microsoft/DialoGPT-small' not found in model shortcut name list (gpt2, gpt2-medium, gpt2-large, gpt2-xl, distilgpt2). Assuming 'microsoft/DialoGPT-small' is a path, a model identifier, or url to a directory containing tokenizer files.
</pre>
</div>
</div>
<div class="output_area">
<div class="output_subarea output_stream output_stdout output_text">
<pre>
</pre>
</div>
</div>
<div class="output_area">
<div class="output_subarea output_stream output_stderr output_text">
<pre>05/13/2020 00:27:11 - INFO - filelock - Lock 139706164883072 acquired on /root/.cache/torch/transformers/78725a31b87003f46d5bffc3157ebd6993290e4cfb7002b5f0e52bb0f0d9c2dd.1512018be4ba4e8726e41b9145129dc30651ea4fec86aa61f4b9f40bf94eac71.lock
05/13/2020 00:27:11 - INFO - transformers.file_utils - https://s3.amazonaws.com/models.huggingface.co/bert/microsoft/DialoGPT-small/vocab.json not found in cache or force_download set to True, downloading to /root/.cache/torch/transformers/tmpaeb7ikva
05/13/2020 00:27:12 - INFO - transformers.file_utils - storing https://s3.amazonaws.com/models.huggingface.co/bert/microsoft/DialoGPT-small/vocab.json in cache at /root/.cache/torch/transformers/78725a31b87003f46d5bffc3157ebd6993290e4cfb7002b5f0e52bb0f0d9c2dd.1512018be4ba4e8726e41b9145129dc30651ea4fec86aa61f4b9f40bf94eac71
05/13/2020 00:27:12 - INFO - transformers.file_utils - creating metadata file for /root/.cache/torch/transformers/78725a31b87003f46d5bffc3157ebd6993290e4cfb7002b5f0e52bb0f0d9c2dd.1512018be4ba4e8726e41b9145129dc30651ea4fec86aa61f4b9f40bf94eac71
05/13/2020 00:27:12 - INFO - filelock - Lock 139706164883072 released on /root/.cache/torch/transformers/78725a31b87003f46d5bffc3157ebd6993290e4cfb7002b5f0e52bb0f0d9c2dd.1512018be4ba4e8726e41b9145129dc30651ea4fec86aa61f4b9f40bf94eac71.lock
</pre>
</div>
</div>
<div class="output_area">
<div class="output_subarea output_stream output_stdout output_text">
<pre>
</pre>
</div>
</div>
<div class="output_area">
<div class="output_subarea output_stream output_stderr output_text">
<pre>05/13/2020 00:27:12 - INFO - filelock - Lock 139706162979168 acquired on /root/.cache/torch/transformers/570e31eddfc57062e4d0c5b078d44f97c0e5ac48f83a2958142849b59df6bbe6.70bec105b4158ed9a1747fea67a43f5dee97855c64d62b6ec3742f4cfdb5feda.lock
05/13/2020 00:27:12 - INFO - transformers.file_utils - https://s3.amazonaws.com/models.huggingface.co/bert/microsoft/DialoGPT-small/merges.txt not found in cache or force_download set to True, downloading to /root/.cache/torch/transformers/tmp4k0b0lt0
05/13/2020 00:27:13 - INFO - transformers.file_utils - storing https://s3.amazonaws.com/models.huggingface.co/bert/microsoft/DialoGPT-small/merges.txt in cache at /root/.cache/torch/transformers/570e31eddfc57062e4d0c5b078d44f97c0e5ac48f83a2958142849b59df6bbe6.70bec105b4158ed9a1747fea67a43f5dee97855c64d62b6ec3742f4cfdb5feda
05/13/2020 00:27:13 - INFO - transformers.file_utils - creating metadata file for /root/.cache/torch/transformers/570e31eddfc57062e4d0c5b078d44f97c0e5ac48f83a2958142849b59df6bbe6.70bec105b4158ed9a1747fea67a43f5dee97855c64d62b6ec3742f4cfdb5feda
05/13/2020 00:27:13 - INFO - filelock - Lock 139706162979168 released on /root/.cache/torch/transformers/570e31eddfc57062e4d0c5b078d44f97c0e5ac48f83a2958142849b59df6bbe6.70bec105b4158ed9a1747fea67a43f5dee97855c64d62b6ec3742f4cfdb5feda.lock
</pre>
</div>
</div>
<div class="output_area">
<div class="output_subarea output_stream output_stdout output_text">
<pre>
</pre>
</div>
</div>
<div class="output_area">
<div class="output_subarea output_stream output_stderr output_text">
<pre>05/13/2020 00:27:14 - INFO - transformers.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/microsoft/DialoGPT-small/vocab.json from cache at /root/.cache/torch/transformers/78725a31b87003f46d5bffc3157ebd6993290e4cfb7002b5f0e52bb0f0d9c2dd.1512018be4ba4e8726e41b9145129dc30651ea4fec86aa61f4b9f40bf94eac71
05/13/2020 00:27:14 - INFO - transformers.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/microsoft/DialoGPT-small/merges.txt from cache at /root/.cache/torch/transformers/570e31eddfc57062e4d0c5b078d44f97c0e5ac48f83a2958142849b59df6bbe6.70bec105b4158ed9a1747fea67a43f5dee97855c64d62b6ec3742f4cfdb5feda
05/13/2020 00:27:14 - INFO - transformers.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/microsoft/DialoGPT-small/added_tokens.json from cache at None
05/13/2020 00:27:14 - INFO - transformers.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/microsoft/DialoGPT-small/special_tokens_map.json from cache at None
05/13/2020 00:27:14 - INFO - transformers.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/microsoft/DialoGPT-small/tokenizer_config.json from cache at None
05/13/2020 00:27:14 - INFO - transformers.configuration_utils - loading configuration file output/config.json
05/13/2020 00:27:14 - INFO - transformers.configuration_utils - Model config GPT2Config {
"activation_function": "gelu_new",
"architectures": [
"GPT2LMHeadModel"
],
"attn_pdrop": 0.1,
"bos_token_id": 50256,
"embd_pdrop": 0.1,
"eos_token_id": 50256,
"initializer_range": 0.02,
"layer_norm_epsilon": 1e-05,
"model_type": "gpt2",
"n_ctx": 1024,
"n_embd": 768,
"n_head": 12,
"n_layer": 12,
"n_positions": 1024,
"resid_pdrop": 0.1,
"summary_activation": null,
"summary_first_dropout": 0.1,
"summary_proj_to_labels": true,
"summary_type": "cls_index",
"summary_use_proj": true,
"vocab_size": 50257
}
05/13/2020 00:27:14 - INFO - transformers.modeling_utils - loading weights file output/pytorch_model.bin
</pre>
</div>
</div>
<div class="output_area">
<div class="output_subarea output_stream output_stdout output_text">
<pre>>> User:Hola.
DialoGPT: ¿Qué estás haciendo?
>> User:Estoy leyendo un libro nuevo.
DialoGPT: ¿Qué leyendo?
>> User:Se llama "The Witcher."
DialoGPT: ¿Qué quieres decir?
>> User:Es un libro sobre magia y monstruos.
DialoGPT: ¿Qué quieres decir?
>> User:¿Te gusta libros?
DialoGPT: ¿Qué te pasa?
>> User:Nada mucho.
DialoGPT: ¿Por qué no me lo dijiste?
</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>Now, it ain't the best, however, training it for longer or using the DialogGPT-medium instead of DialogGPT-small does improve results, at least in my experiments. I decided to only include the DialogGPT-small in this tutorial due to the limited (But still AMAZING) resources of Google Colab. I've went ahead and trained a bigger DialogGPT-medium model for longer and have uploaded it to Huggingface for anyone to try out! Find the <a href="https://huggingface.co/ncoop57/DiGPTame-medium">model card here</a>.</p>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h1 id="Conclusion">Conclusion<a class="anchor-link" href="#Conclusion"> </a></h1><p>In this tutorial, you learned how to train an Open-Dialog chatbot in any language we want to practice with! This involved learning about the amazing <code>transformers</code> library by Huggingface that has seen a lot of popularity recently. You've also learned what an Open-Dialog chatbot is and some of the difficulties that come with training them such as constructing training examples and generating repetitive text.</p>
<p>This is just part one in what I am hoping will be a three part series! In the next part, we will take our model and integrate it into a web app using the awesome <a href="https://www.streamlit.io/">Streamlit</a> library. Finally, part three will then be generating an Android application for chatting with your new language companion!</p>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h1 id="PS">PS<a class="anchor-link" href="#PS"> </a></h1><p>If you do train a new chatbot for a language of your interest, please share it! I'd love to hear about your progress with it and I'm sure others would be also interested in it as these models can be quite expensive to train.</p>
<p>If you want an ease way to share it, I suggest submitting your trained model to Huggingface's model zoo, where others can view and download your model to use as a starting point for their applications! Here is a simple way for taking the model trained in this tutorial and uploading it to Hugginface's website following the instructions on the Huggingface <a href="https://huggingface.co/transformers/model_sharing.html">website</a>:</p>
<p>First make sure you have a Huggingface account: <a href="https://huggingface.co/join">https://huggingface.co/join</a>. Next Run the following code snippets and that's it!</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="o">!</span> rm -rf output/checkpoint-*
<span class="o">!</span> mv output <name_of_model>
</pre></div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="o">!</span> transformers-cli login
<span class="c1"># log in using the same credentials as on huggingface.co</span>
</pre></div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="o">!</span> transformers-cli upload <name_of_model>
</pre></div>
</div>
</div>
</div>
</div>
</div>
<script type="application/vnd.jupyter.widget-state+json">
{"5a3aefdbfa4140aeaa8eacef6e05d527": {"model_module": "@jupyter-widgets/controls", "model_name": "HBoxModel", "state": {"_view_name": "HBoxView", "_dom_classes": [], "_model_name": "HBoxModel", "_view_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_view_count": null, "_view_module_version": "1.5.0", "box_style": "", "layout": "IPY_MODEL_42ab64db40464775abbd563d2d398130", "_model_module": "@jupyter-widgets/controls", "children": ["IPY_MODEL_ad54c5a7b2e04c4d8c0e74563bf68445", "IPY_MODEL_5b37bf4feaae4f0a8b259bd1e23df457"]}}, "42ab64db40464775abbd563d2d398130": {"model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "state": {"_view_name": "LayoutView", "grid_template_rows": null, "right": null, "justify_content": null, "_view_module": "@jupyter-widgets/base", "overflow": null, "_model_module_version": "1.2.0", "_view_count": null, "flex_flow": null, "width": null, "min_width": null, "border": null, "align_items": null, "bottom": null, "_model_module": "@jupyter-widgets/base", "top": null, "grid_column": null, "overflow_y": null, "overflow_x": null, "grid_auto_flow": null, "grid_area": null, "grid_template_columns": null, "flex": null, "_model_name": "LayoutModel", "justify_items": null, "grid_row": null, "max_height": null, "align_content": null, "visibility": null, "align_self": null, "height": null, "min_height": null, "padding": null, "grid_auto_rows": null, "grid_gap": null, "max_width": null, "order": null, "_view_module_version": "1.2.0", "grid_template_areas": null, "object_position": null, "object_fit": null, "grid_auto_columns": null, "margin": null, "display": null, "left": null}}, "ad54c5a7b2e04c4d8c0e74563bf68445": {"model_module": "@jupyter-widgets/controls", "model_name": "FloatProgressModel", "state": {"_view_name": "ProgressView", "style": "IPY_MODEL_21147020f28c4941896a983e7d3dee58", "_dom_classes": [], "description": "Downloading: 100%", "_model_name": "FloatProgressModel", "bar_style": "success", "max": 554, "_view_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "value": 554, "_view_count": null, "_view_module_version": "1.5.0", "orientation": "horizontal", "min": 0, "description_tooltip": null, "_model_module": "@jupyter-widgets/controls", "layout": "IPY_MODEL_78bcb3bb3aa8439aaebbd68e0d5f2d8a"}}, "5b37bf4feaae4f0a8b259bd1e23df457": {"model_module": "@jupyter-widgets/controls", "model_name": "HTMLModel", "state": {"_view_name": "HTMLView", "style": "IPY_MODEL_99dbd6be5282455abe545fec10ad8dfa", "_dom_classes": [], "description": "", "_model_name": "HTMLModel", "placeholder": "\u200b", "_view_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "value": " 554/554 [00:02<00:00, 259B/s]", "_view_count": null, "_view_module_version": "1.5.0", "description_tooltip": null, "_model_module": "@jupyter-widgets/controls", "layout": "IPY_MODEL_5ffee55ffa394080a07609226086b803"}}, "21147020f28c4941896a983e7d3dee58": {"model_module": "@jupyter-widgets/controls", "model_name": "ProgressStyleModel", "state": {"_view_name": "StyleView", "_model_name": "ProgressStyleModel", "description_width": "initial", "_view_module": "@jupyter-widgets/base", "_model_module_version": "1.5.0", "_view_count": null, "_view_module_version": "1.2.0", "bar_color": null, "_model_module": "@jupyter-widgets/controls"}}, "78bcb3bb3aa8439aaebbd68e0d5f2d8a": {"model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "state": {"_view_name": "LayoutView", "grid_template_rows": null, "right": null, "justify_content": null, "_view_module": "@jupyter-widgets/base", "overflow": null, "_model_module_version": "1.2.0", "_view_count": null, "flex_flow": null, "width": null, "min_width": null, "border": null, "align_items": null, "bottom": null, "_model_module": "@jupyter-widgets/base", "top": null, "grid_column": null, "overflow_y": null, "overflow_x": null, "grid_auto_flow": null, "grid_area": null, "grid_template_columns": null, "flex": null, "_model_name": "LayoutModel", "justify_items": null, "grid_row": null, "max_height": null, "align_content": null, "visibility": null, "align_self": null, "height": null, "min_height": null, "padding": null, "grid_auto_rows": null, "grid_gap": null, "max_width": null, "order": null, "_view_module_version": "1.2.0", "grid_template_areas": null, "object_position": null, "object_fit": null, "grid_auto_columns": null, "margin": null, "display": null, "left": null}}, "99dbd6be5282455abe545fec10ad8dfa": {"model_module": "@jupyter-widgets/controls", "model_name": "DescriptionStyleModel", "state": {"_view_name": "StyleView", "_model_name": "DescriptionStyleModel", "description_width": "", "_view_module": "@jupyter-widgets/base", "_model_module_version": "1.5.0", "_view_count": null, "_view_module_version": "1.2.0", "_model_module": "@jupyter-widgets/controls"}}, "5ffee55ffa394080a07609226086b803": {"model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "state": {"_view_name": "LayoutView", "grid_template_rows": null, "right": null, "justify_content": null, "_view_module": "@jupyter-widgets/base", "overflow": null, "_model_module_version": "1.2.0", "_view_count": null, "flex_flow": null, "width": null, "min_width": null, "border": null, "align_items": null, "bottom": null, "_model_module": "@jupyter-widgets/base", "top": null, "grid_column": null, "overflow_y": null, "overflow_x": null, "grid_auto_flow": null, "grid_area": null, "grid_template_columns": null, "flex": null, "_model_name": "LayoutModel", "justify_items": null, "grid_row": null, "max_height": null, "align_content": null, "visibility": null, "align_self": null, "height": null, "min_height": null, "padding": null, "grid_auto_rows": null, "grid_gap": null, "max_width": null, "order": null, "_view_module_version": "1.2.0", "grid_template_areas": null, "object_position": null, "object_fit": null, "grid_auto_columns": null, "margin": null, "display": null, "left": null}}, "aded096bbf744fb0a1b05880b3522cae": {"model_module": "@jupyter-widgets/controls", "model_name": "HBoxModel", "state": {"_view_name": "HBoxView", "_dom_classes": [], "_model_name": "HBoxModel", "_view_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_view_count": null, "_view_module_version": "1.5.0", "box_style": "", "layout": "IPY_MODEL_71986f23b3c348f395365d0da817c90a", "_model_module": "@jupyter-widgets/controls", "children": ["IPY_MODEL_d8bd8ad1181344c49ccd46db1c7ae3ad", "IPY_MODEL_4f90af41548e4eda94999d1241b52414"]}}, "71986f23b3c348f395365d0da817c90a": {"model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "state": {"_view_name": "LayoutView", "grid_template_rows": null, "right": null, "justify_content": null, "_view_module": "@jupyter-widgets/base", "overflow": null, "_model_module_version": "1.2.0", "_view_count": null, "flex_flow": null, "width": null, "min_width": null, "border": null, "align_items": null, "bottom": null, "_model_module": "@jupyter-widgets/base", "top": null, "grid_column": null, "overflow_y": null, "overflow_x": null, "grid_auto_flow": null, "grid_area": null, "grid_template_columns": null, "flex": null, "_model_name": "LayoutModel", "justify_items": null, "grid_row": null, "max_height": null, "align_content": null, "visibility": null, "align_self": null, "height": null, "min_height": null, "padding": null, "grid_auto_rows": null, "grid_gap": null, "max_width": null, "order": null, "_view_module_version": "1.2.0", "grid_template_areas": null, "object_position": null, "object_fit": null, "grid_auto_columns": null, "margin": null, "display": null, "left": null}}, "d8bd8ad1181344c49ccd46db1c7ae3ad": {"model_module": "@jupyter-widgets/controls", "model_name": "FloatProgressModel", "state": {"_view_name": "ProgressView", "style": "IPY_MODEL_a6cc8cfe61d84d0a9e4904869371b753", "_dom_classes": [], "description": "Downloading: 100%", "_model_name": "FloatProgressModel", "bar_style": "success", "max": 1042301, "_view_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "value": 1042301, "_view_count": null, "_view_module_version": "1.5.0", "orientation": "horizontal", "min": 0, "description_tooltip": null, "_model_module": "@jupyter-widgets/controls", "layout": "IPY_MODEL_10805dbd211940c2ba320e17c1df1555"}}, "4f90af41548e4eda94999d1241b52414": {"model_module": "@jupyter-widgets/controls", "model_name": "HTMLModel", "state": {"_view_name": "HTMLView", "style": "IPY_MODEL_231b27853c0547f3bc6b5600ee2fde9b", "_dom_classes": [], "description": "", "_model_name": "HTMLModel", "placeholder": "\u200b", "_view_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "value": " 1.04M/1.04M [00:01<00:00, 851kB/s]", "_view_count": null, "_view_module_version": "1.5.0", "description_tooltip": null, "_model_module": "@jupyter-widgets/controls", "layout": "IPY_MODEL_6588e145be214b74a3daba8c33d668b0"}}, "a6cc8cfe61d84d0a9e4904869371b753": {"model_module": "@jupyter-widgets/controls", "model_name": "ProgressStyleModel", "state": {"_view_name": "StyleView", "_model_name": "ProgressStyleModel", "description_width": "initial", "_view_module": "@jupyter-widgets/base", "_model_module_version": "1.5.0", "_view_count": null, "_view_module_version": "1.2.0", "bar_color": null, "_model_module": "@jupyter-widgets/controls"}}, "10805dbd211940c2ba320e17c1df1555": {"model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "state": {"_view_name": "LayoutView", "grid_template_rows": null, "right": null, "justify_content": null, "_view_module": "@jupyter-widgets/base", "overflow": null, "_model_module_version": "1.2.0", "_view_count": null, "flex_flow": null, "width": null, "min_width": null, "border": null, "align_items": null, "bottom": null, "_model_module": "@jupyter-widgets/base", "top": null, "grid_column": null, "overflow_y": null, "overflow_x": null, "grid_auto_flow": null, "grid_area": null, "grid_template_columns": null, "flex": null, "_model_name": "LayoutModel", "justify_items": null, "grid_row": null, "max_height": null, "align_content": null, "visibility": null, "align_self": null, "height": null, "min_height": null, "padding": null, "grid_auto_rows": null, "grid_gap": null, "max_width": null, "order": null, "_view_module_version": "1.2.0", "grid_template_areas": null, "object_position": null, "object_fit": null, "grid_auto_columns": null, "margin": null, "display": null, "left": null}}, "231b27853c0547f3bc6b5600ee2fde9b": {"model_module": "@jupyter-widgets/controls", "model_name": "DescriptionStyleModel", "state": {"_view_name": "StyleView", "_model_name": "DescriptionStyleModel", "description_width": "", "_view_module": "@jupyter-widgets/base", "_model_module_version": "1.5.0", "_view_count": null, "_view_module_version": "1.2.0", "_model_module": "@jupyter-widgets/controls"}}, "6588e145be214b74a3daba8c33d668b0": {"model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "state": {"_view_name": "LayoutView", "grid_template_rows": null, "right": null, "justify_content": null, "_view_module": "@jupyter-widgets/base", "overflow": null, "_model_module_version": "1.2.0", "_view_count": null, "flex_flow": null, "width": null, "min_width": null, "border": null, "align_items": null, "bottom": null, "_model_module": "@jupyter-widgets/base", "top": null, "grid_column": null, "overflow_y": null, "overflow_x": null, "grid_auto_flow": null, "grid_area": null, "grid_template_columns": null, "flex": null, "_model_name": "LayoutModel", "justify_items": null, "grid_row": null, "max_height": null, "align_content": null, "visibility": null, "align_self": null, "height": null, "min_height": null, "padding": null, "grid_auto_rows": null, "grid_gap": null, "max_width": null, "order": null, "_view_module_version": "1.2.0", "grid_template_areas": null, "object_position": null, "object_fit": null, "grid_auto_columns": null, "margin": null, "display": null, "left": null}}, "bd9a26c919cd43e68af659f08c840ed7": {"model_module": "@jupyter-widgets/controls", "model_name": "HBoxModel", "state": {"_view_name": "HBoxView", "_dom_classes": [], "_model_name": "HBoxModel", "_view_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_view_count": null, "_view_module_version": "1.5.0", "box_style": "", "layout": "IPY_MODEL_d776fe4027a24ff2b7a05a4bf93725f2", "_model_module": "@jupyter-widgets/controls", "children": ["IPY_MODEL_2ce62470d0b845bb877bd8e284af1783", "IPY_MODEL_4db46f7fa885490db935bb3ed0533e8f"]}}, "d776fe4027a24ff2b7a05a4bf93725f2": {"model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "state": {"_view_name": "LayoutView", "grid_template_rows": null, "right": null, "justify_content": null, "_view_module": "@jupyter-widgets/base", "overflow": null, "_model_module_version": "1.2.0", "_view_count": null, "flex_flow": null, "width": null, "min_width": null, "border": null, "align_items": null, "bottom": null, "_model_module": "@jupyter-widgets/base", "top": null, "grid_column": null, "overflow_y": null, "overflow_x": null, "grid_auto_flow": null, "grid_area": null, "grid_template_columns": null, "flex": null, "_model_name": "LayoutModel", "justify_items": null, "grid_row": null, "max_height": null, "align_content": null, "visibility": null, "align_self": null, "height": null, "min_height": null, "padding": null, "grid_auto_rows": null, "grid_gap": null, "max_width": null, "order": null, "_view_module_version": "1.2.0", "grid_template_areas": null, "object_position": null, "object_fit": null, "grid_auto_columns": null, "margin": null, "display": null, "left": null}}, "2ce62470d0b845bb877bd8e284af1783": {"model_module": "@jupyter-widgets/controls", "model_name": "FloatProgressModel", "state": {"_view_name": "ProgressView", "style": "IPY_MODEL_31e22b9fd09a4aa99b496944d9ff5aa8", "_dom_classes": [], "description": "Downloading: 100%", "_model_name": "FloatProgressModel", "bar_style": "success", "max": 456318, "_view_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "value": 456318, "_view_count": null, "_view_module_version": "1.5.0", "orientation": "horizontal", "min": 0, "description_tooltip": null, "_model_module": "@jupyter-widgets/controls", "layout": "IPY_MODEL_e6d3aff091e34a8fb23a61568c31cd6b"}}, "4db46f7fa885490db935bb3ed0533e8f": {"model_module": "@jupyter-widgets/controls", "model_name": "HTMLModel", "state": {"_view_name": "HTMLView", "style": "IPY_MODEL_c0f3a2716bec4dca9827ff0fbafd43ea", "_dom_classes": [], "description": "", "_model_name": "HTMLModel", "placeholder": "\u200b", "_view_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "value": " 456k/456k [00:01<00:00, 286kB/s]", "_view_count": null, "_view_module_version": "1.5.0", "description_tooltip": null, "_model_module": "@jupyter-widgets/controls", "layout": "IPY_MODEL_daf5041e797e40c3b02e8649e1bbba4f"}}, "31e22b9fd09a4aa99b496944d9ff5aa8": {"model_module": "@jupyter-widgets/controls", "model_name": "ProgressStyleModel", "state": {"_view_name": "StyleView", "_model_name": "ProgressStyleModel", "description_width": "initial", "_view_module": "@jupyter-widgets/base", "_model_module_version": "1.5.0", "_view_count": null, "_view_module_version": "1.2.0", "bar_color": null, "_model_module": "@jupyter-widgets/controls"}}, "e6d3aff091e34a8fb23a61568c31cd6b": {"model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "state": {"_view_name": "LayoutView", "grid_template_rows": null, "right": null, "justify_content": null, "_view_module": "@jupyter-widgets/base", "overflow": null, "_model_module_version": "1.2.0", "_view_count": null, "flex_flow": null, "width": null, "min_width": null, "border": null, "align_items": null, "bottom": null, "_model_module": "@jupyter-widgets/base", "top": null, "grid_column": null, "overflow_y": null, "overflow_x": null, "grid_auto_flow": null, "grid_area": null, "grid_template_columns": null, "flex": null, "_model_name": "LayoutModel", "justify_items": null, "grid_row": null, "max_height": null, "align_content": null, "visibility": null, "align_self": null, "height": null, "min_height": null, "padding": null, "grid_auto_rows": null, "grid_gap": null, "max_width": null, "order": null, "_view_module_version": "1.2.0", "grid_template_areas": null, "object_position": null, "object_fit": null, "grid_auto_columns": null, "margin": null, "display": null, "left": null}}, "c0f3a2716bec4dca9827ff0fbafd43ea": {"model_module": "@jupyter-widgets/controls", "model_name": "DescriptionStyleModel", "state": {"_view_name": "StyleView", "_model_name": "DescriptionStyleModel", "description_width": "", "_view_module": "@jupyter-widgets/base", "_model_module_version": "1.5.0", "_view_count": null, "_view_module_version": "1.2.0", "_model_module": "@jupyter-widgets/controls"}}, "daf5041e797e40c3b02e8649e1bbba4f": {"model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "state": {"_view_name": "LayoutView", "grid_template_rows": null, "right": null, "justify_content": null, "_view_module": "@jupyter-widgets/base", "overflow": null, "_model_module_version": "1.2.0", "_view_count": null, "flex_flow": null, "width": null, "min_width": null, "border": null, "align_items": null, "bottom": null, "_model_module": "@jupyter-widgets/base", "top": null, "grid_column": null, "overflow_y": null, "overflow_x": null, "grid_auto_flow": null, "grid_area": null, "grid_template_columns": null, "flex": null, "_model_name": "LayoutModel", "justify_items": null, "grid_row": null, "max_height": null, "align_content": null, "visibility": null, "align_self": null, "height": null, "min_height": null, "padding": null, "grid_auto_rows": null, "grid_gap": null, "max_width": null, "order": null, "_view_module_version": "1.2.0", "grid_template_areas": null, "object_position": null, "object_fit": null, "grid_auto_columns": null, "margin": null, "display": null, "left": null}}, "e45254bb4f794a88a929c6c857afbb83": {"model_module": "@jupyter-widgets/controls", "model_name": "HBoxModel", "state": {"_view_name": "HBoxView", "_dom_classes": [], "_model_name": "HBoxModel", "_view_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_view_count": null, "_view_module_version": "1.5.0", "box_style": "", "layout": "IPY_MODEL_2218bf1642b04d5ba0765f0f734ceb7c", "_model_module": "@jupyter-widgets/controls", "children": ["IPY_MODEL_8d47d84a9ad8463f879072489a607e86", "IPY_MODEL_ba2b3e9b59b3445ba84382e0ca5e2be8"]}}, "2218bf1642b04d5ba0765f0f734ceb7c": {"model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "state": {"_view_name": "LayoutView", "grid_template_rows": null, "right": null, "justify_content": null, "_view_module": "@jupyter-widgets/base", "overflow": null, "_model_module_version": "1.2.0", "_view_count": null, "flex_flow": null, "width": null, "min_width": null, "border": null, "align_items": null, "bottom": null, "_model_module": "@jupyter-widgets/base", "top": null, "grid_column": null, "overflow_y": null, "overflow_x": null, "grid_auto_flow": null, "grid_area": null, "grid_template_columns": null, "flex": null, "_model_name": "LayoutModel", "justify_items": null, "grid_row": null, "max_height": null, "align_content": null, "visibility": null, "align_self": null, "height": null, "min_height": null, "padding": null, "grid_auto_rows": null, "grid_gap": null, "max_width": null, "order": null, "_view_module_version": "1.2.0", "grid_template_areas": null, "object_position": null, "object_fit": null, "grid_auto_columns": null, "margin": null, "display": null, "left": null}}, "8d47d84a9ad8463f879072489a607e86": {"model_module": "@jupyter-widgets/controls", "model_name": "FloatProgressModel", "state": {"_view_name": "ProgressView", "style": "IPY_MODEL_64594d21ea31486592fb7faf9f236f95", "_dom_classes": [], "description": "Downloading: 100%", "_model_name": "FloatProgressModel", "bar_style": "success", "max": 554, "_view_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "value": 554, "_view_count": null, "_view_module_version": "1.5.0", "orientation": "horizontal", "min": 0, "description_tooltip": null, "_model_module": "@jupyter-widgets/controls", "layout": "IPY_MODEL_3e98ca9215b747068270fa7d64f3899f"}}, "ba2b3e9b59b3445ba84382e0ca5e2be8": {"model_module": "@jupyter-widgets/controls", "model_name": "HTMLModel", "state": {"_view_name": "HTMLView", "style": "IPY_MODEL_0e022de8639340b7a26a6ce701961a57", "_dom_classes": [], "description": "", "_model_name": "HTMLModel", "placeholder": "\u200b", "_view_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "value": " 554/554 [00:00<00:00, 639B/s]", "_view_count": null, "_view_module_version": "1.5.0", "description_tooltip": null, "_model_module": "@jupyter-widgets/controls", "layout": "IPY_MODEL_e2d2843ff50e4b448beca35526f523c3"}}, "64594d21ea31486592fb7faf9f236f95": {"model_module": "@jupyter-widgets/controls", "model_name": "ProgressStyleModel", "state": {"_view_name": "StyleView", "_model_name": "ProgressStyleModel", "description_width": "initial", "_view_module": "@jupyter-widgets/base", "_model_module_version": "1.5.0", "_view_count": null, "_view_module_version": "1.2.0", "bar_color": null, "_model_module": "@jupyter-widgets/controls"}}, "3e98ca9215b747068270fa7d64f3899f": {"model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "state": {"_view_name": "LayoutView", "grid_template_rows": null, "right": null, "justify_content": null, "_view_module": "@jupyter-widgets/base", "overflow": null, "_model_module_version": "1.2.0", "_view_count": null, "flex_flow": null, "width": null, "min_width": null, "border": null, "align_items": null, "bottom": null, "_model_module": "@jupyter-widgets/base", "top": null, "grid_column": null, "overflow_y": null, "overflow_x": null, "grid_auto_flow": null, "grid_area": null, "grid_template_columns": null, "flex": null, "_model_name": "LayoutModel", "justify_items": null, "grid_row": null, "max_height": null, "align_content": null, "visibility": null, "align_self": null, "height": null, "min_height": null, "padding": null, "grid_auto_rows": null, "grid_gap": null, "max_width": null, "order": null, "_view_module_version": "1.2.0", "grid_template_areas": null, "object_position": null, "object_fit": null, "grid_auto_columns": null, "margin": null, "display": null, "left": null}}, "0e022de8639340b7a26a6ce701961a57": {"model_module": "@jupyter-widgets/controls", "model_name": "DescriptionStyleModel", "state": {"_view_name": "StyleView", "_model_name": "DescriptionStyleModel", "description_width": "", "_view_module": "@jupyter-widgets/base", "_model_module_version": "1.5.0", "_view_count": null, "_view_module_version": "1.2.0", "_model_module": "@jupyter-widgets/controls"}}, "e2d2843ff50e4b448beca35526f523c3": {"model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "state": {"_view_name": "LayoutView", "grid_template_rows": null, "right": null, "justify_content": null, "_view_module": "@jupyter-widgets/base", "overflow": null, "_model_module_version": "1.2.0", "_view_count": null, "flex_flow": null, "width": null, "min_width": null, "border": null, "align_items": null, "bottom": null, "_model_module": "@jupyter-widgets/base", "top": null, "grid_column": null, "overflow_y": null, "overflow_x": null, "grid_auto_flow": null, "grid_area": null, "grid_template_columns": null, "flex": null, "_model_name": "LayoutModel", "justify_items": null, "grid_row": null, "max_height": null, "align_content": null, "visibility": null, "align_self": null, "height": null, "min_height": null, "padding": null, "grid_auto_rows": null, "grid_gap": null, "max_width": null, "order": null, "_view_module_version": "1.2.0", "grid_template_areas": null, "object_position": null, "object_fit": null, "grid_auto_columns": null, "margin": null, "display": null, "left": null}}, "3281f9ba3a2440c0aa28c42745a03df2": {"model_module": "@jupyter-widgets/controls", "model_name": "HBoxModel", "state": {"_view_name": "HBoxView", "_dom_classes": [], "_model_name": "HBoxModel", "_view_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_view_count": null, "_view_module_version": "1.5.0", "box_style": "", "layout": "IPY_MODEL_f8672f80090940d18519e4684c900703", "_model_module": "@jupyter-widgets/controls", "children": ["IPY_MODEL_f232c70932c64df5b7cfb1e4096a0129", "IPY_MODEL_3bdca6d24d80427e9b4ef8f30eb622df"]}}, "f8672f80090940d18519e4684c900703": {"model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "state": {"_view_name": "LayoutView", "grid_template_rows": null, "right": null, "justify_content": null, "_view_module": "@jupyter-widgets/base", "overflow": null, "_model_module_version": "1.2.0", "_view_count": null, "flex_flow": null, "width": null, "min_width": null, "border": null, "align_items": null, "bottom": null, "_model_module": "@jupyter-widgets/base", "top": null, "grid_column": null, "overflow_y": null, "overflow_x": null, "grid_auto_flow": null, "grid_area": null, "grid_template_columns": null, "flex": null, "_model_name": "LayoutModel", "justify_items": null, "grid_row": null, "max_height": null, "align_content": null, "visibility": null, "align_self": null, "height": null, "min_height": null, "padding": null, "grid_auto_rows": null, "grid_gap": null, "max_width": null, "order": null, "_view_module_version": "1.2.0", "grid_template_areas": null, "object_position": null, "object_fit": null, "grid_auto_columns": null, "margin": null, "display": null, "left": null}}, "f232c70932c64df5b7cfb1e4096a0129": {"model_module": "@jupyter-widgets/controls", "model_name": "FloatProgressModel", "state": {"_view_name": "ProgressView", "style": "IPY_MODEL_4a4ab84ce1fe44e396c90dc87f877613", "_dom_classes": [], "description": "Downloading: 100%", "_model_name": "FloatProgressModel", "bar_style": "success", "max": 1042301, "_view_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "value": 1042301, "_view_count": null, "_view_module_version": "1.5.0", "orientation": "horizontal", "min": 0, "description_tooltip": null, "_model_module": "@jupyter-widgets/controls", "layout": "IPY_MODEL_9e66001b01e24173b3d062c7857bfbc9"}}, "3bdca6d24d80427e9b4ef8f30eb622df": {"model_module": "@jupyter-widgets/controls", "model_name": "HTMLModel", "state": {"_view_name": "HTMLView", "style": "IPY_MODEL_7f46e9511a9a46a4b2184b486a3e73be", "_dom_classes": [], "description": "", "_model_name": "HTMLModel", "placeholder": "\u200b", "_view_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "value": " 1.04M/1.04M [00:02<00:00, 356kB/s]", "_view_count": null, "_view_module_version": "1.5.0", "description_tooltip": null, "_model_module": "@jupyter-widgets/controls", "layout": "IPY_MODEL_e718f9dbf16e41508d4e139f898feaa0"}}, "4a4ab84ce1fe44e396c90dc87f877613": {"model_module": "@jupyter-widgets/controls", "model_name": "ProgressStyleModel", "state": {"_view_name": "StyleView", "_model_name": "ProgressStyleModel", "description_width": "initial", "_view_module": "@jupyter-widgets/base", "_model_module_version": "1.5.0", "_view_count": null, "_view_module_version": "1.2.0", "bar_color": null, "_model_module": "@jupyter-widgets/controls"}}, "9e66001b01e24173b3d062c7857bfbc9": {"model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "state": {"_view_name": "LayoutView", "grid_template_rows": null, "right": null, "justify_content": null, "_view_module": "@jupyter-widgets/base", "overflow": null, "_model_module_version": "1.2.0", "_view_count": null, "flex_flow": null, "width": null, "min_width": null, "border": null, "align_items": null, "bottom": null, "_model_module": "@jupyter-widgets/base", "top": null, "grid_column": null, "overflow_y": null, "overflow_x": null, "grid_auto_flow": null, "grid_area": null, "grid_template_columns": null, "flex": null, "_model_name": "LayoutModel", "justify_items": null, "grid_row": null, "max_height": null, "align_content": null, "visibility": null, "align_self": null, "height": null, "min_height": null, "padding": null, "grid_auto_rows": null, "grid_gap": null, "max_width": null, "order": null, "_view_module_version": "1.2.0", "grid_template_areas": null, "object_position": null, "object_fit": null, "grid_auto_columns": null, "margin": null, "display": null, "left": null}}, "7f46e9511a9a46a4b2184b486a3e73be": {"model_module": "@jupyter-widgets/controls", "model_name": "DescriptionStyleModel", "state": {"_view_name": "StyleView", "_model_name": "DescriptionStyleModel", "description_width": "", "_view_module": "@jupyter-widgets/base", "_model_module_version": "1.5.0", "_view_count": null, "_view_module_version": "1.2.0", "_model_module": "@jupyter-widgets/controls"}}, "e718f9dbf16e41508d4e139f898feaa0": {"model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "state": {"_view_name": "LayoutView", "grid_template_rows": null, "right": null, "justify_content": null, "_view_module": "@jupyter-widgets/base", "overflow": null, "_model_module_version": "1.2.0", "_view_count": null, "flex_flow": null, "width": null, "min_width": null, "border": null, "align_items": null, "bottom": null, "_model_module": "@jupyter-widgets/base", "top": null, "grid_column": null, "overflow_y": null, "overflow_x": null, "grid_auto_flow": null, "grid_area": null, "grid_template_columns": null, "flex": null, "_model_name": "LayoutModel", "justify_items": null, "grid_row": null, "max_height": null, "align_content": null, "visibility": null, "align_self": null, "height": null, "min_height": null, "padding": null, "grid_auto_rows": null, "grid_gap": null, "max_width": null, "order": null, "_view_module_version": "1.2.0", "grid_template_areas": null, "object_position": null, "object_fit": null, "grid_auto_columns": null, "margin": null, "display": null, "left": null}}, "05b0d970c6de4ccea7bf268e390974a4": {"model_module": "@jupyter-widgets/controls", "model_name": "HBoxModel", "state": {"_view_name": "HBoxView", "_dom_classes": [], "_model_name": "HBoxModel", "_view_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_view_count": null, "_view_module_version": "1.5.0", "box_style": "", "layout": "IPY_MODEL_c4ab36adf4c24640aade28cd6cbccf2c", "_model_module": "@jupyter-widgets/controls", "children": ["IPY_MODEL_90009b2929e34639bef82bdfdcb99d79", "IPY_MODEL_e2652b36d9dc4cdda2a9b79ccf42eb7c"]}}, "c4ab36adf4c24640aade28cd6cbccf2c": {"model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "state": {"_view_name": "LayoutView", "grid_template_rows": null, "right": null, "justify_content": null, "_view_module": "@jupyter-widgets/base", "overflow": null, "_model_module_version": "1.2.0", "_view_count": null, "flex_flow": null, "width": null, "min_width": null, "border": null, "align_items": null, "bottom": null, "_model_module": "@jupyter-widgets/base", "top": null, "grid_column": null, "overflow_y": null, "overflow_x": null, "grid_auto_flow": null, "grid_area": null, "grid_template_columns": null, "flex": null, "_model_name": "LayoutModel", "justify_items": null, "grid_row": null, "max_height": null, "align_content": null, "visibility": null, "align_self": null, "height": null, "min_height": null, "padding": null, "grid_auto_rows": null, "grid_gap": null, "max_width": null, "order": null, "_view_module_version": "1.2.0", "grid_template_areas": null, "object_position": null, "object_fit": null, "grid_auto_columns": null, "margin": null, "display": null, "left": null}}, "90009b2929e34639bef82bdfdcb99d79": {"model_module": "@jupyter-widgets/controls", "model_name": "FloatProgressModel", "state": {"_view_name": "ProgressView", "style": "IPY_MODEL_b502bcff4fb5479fba3b4acd5c155614", "_dom_classes": [], "description": "Downloading: 100%", "_model_name": "FloatProgressModel", "bar_style": "success", "max": 456318, "_view_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "value": 456318, "_view_count": null, "_view_module_version": "1.5.0", "orientation": "horizontal", "min": 0, "description_tooltip": null, "_model_module": "@jupyter-widgets/controls", "layout": "IPY_MODEL_c958b278aa7b47e084b3f65d07a51d99"}}, "e2652b36d9dc4cdda2a9b79ccf42eb7c": {"model_module": "@jupyter-widgets/controls", "model_name": "HTMLModel", "state": {"_view_name": "HTMLView", "style": "IPY_MODEL_063970ce61cb4156a7925955b7e2518d", "_dom_classes": [], "description": "", "_model_name": "HTMLModel", "placeholder": "\u200b", "_view_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "value": " 456k/456k [00:01<00:00, 303kB/s]", "_view_count": null, "_view_module_version": "1.5.0", "description_tooltip": null, "_model_module": "@jupyter-widgets/controls", "layout": "IPY_MODEL_ec3bce1043cb47d19d99b9201f9a132c"}}, "b502bcff4fb5479fba3b4acd5c155614": {"model_module": "@jupyter-widgets/controls", "model_name": "ProgressStyleModel", "state": {"_view_name": "StyleView", "_model_name": "ProgressStyleModel", "description_width": "initial", "_view_module": "@jupyter-widgets/base", "_model_module_version": "1.5.0", "_view_count": null, "_view_module_version": "1.2.0", "bar_color": null, "_model_module": "@jupyter-widgets/controls"}}, "c958b278aa7b47e084b3f65d07a51d99": {"model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "state": {"_view_name": "LayoutView", "grid_template_rows": null, "right": null, "justify_content": null, "_view_module": "@jupyter-widgets/base", "overflow": null, "_model_module_version": "1.2.0", "_view_count": null, "flex_flow": null, "width": null, "min_width": null, "border": null, "align_items": null, "bottom": null, "_model_module": "@jupyter-widgets/base", "top": null, "grid_column": null, "overflow_y": null, "overflow_x": null, "grid_auto_flow": null, "grid_area": null, "grid_template_columns": null, "flex": null, "_model_name": "LayoutModel", "justify_items": null, "grid_row": null, "max_height": null, "align_content": null, "visibility": null, "align_self": null, "height": null, "min_height": null, "padding": null, "grid_auto_rows": null, "grid_gap": null, "max_width": null, "order": null, "_view_module_version": "1.2.0", "grid_template_areas": null, "object_position": null, "object_fit": null, "grid_auto_columns": null, "margin": null, "display": null, "left": null}}, "063970ce61cb4156a7925955b7e2518d": {"model_module": "@jupyter-widgets/controls", "model_name": "DescriptionStyleModel", "state": {"_view_name": "StyleView", "_model_name": "DescriptionStyleModel", "description_width": "", "_view_module": "@jupyter-widgets/base", "_model_module_version": "1.5.0", "_view_count": null, "_view_module_version": "1.2.0", "_model_module": "@jupyter-widgets/controls"}}, "ec3bce1043cb47d19d99b9201f9a132c": {"model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "state": {"_view_name": "LayoutView", "grid_template_rows": null, "right": null, "justify_content": null, "_view_module": "@jupyter-widgets/base", "overflow": null, "_model_module_version": "1.2.0", "_view_count": null, "flex_flow": null, "width": null, "min_width": null, "border": null, "align_items": null, "bottom": null, "_model_module": "@jupyter-widgets/base", "top": null, "grid_column": null, "overflow_y": null, "overflow_x": null, "grid_auto_flow": null, "grid_area": null, "grid_template_columns": null, "flex": null, "_model_name": "LayoutModel", "justify_items": null, "grid_row": null, "max_height": null, "align_content": null, "visibility": null, "align_self": null, "height": null, "min_height": null, "padding": null, "grid_auto_rows": null, "grid_gap": null, "max_width": null, "order": null, "_view_module_version": "1.2.0", "grid_template_areas": null, "object_position": null, "object_fit": null, "grid_auto_columns": null, "margin": null, "display": null, "left": null}}}
</script>What I Learned (WIL) Neuroscience Month [Part 1]2020-04-07T00:00:00-05:002020-04-07T00:00:00-05:00https://nathancooper.io/i-am-a-nerd/wil/neuroscience/2020/04/07/neuroscience-1<p>If you are like me, you may have thought there was quite a lot of mystery going on in your brain that the scientific community has yet to really figure out and understand. However, through my month long study I found that neuroscience has some powerful computational models and experiments that are able to explain many of the processes going on in the brain such as the visual system, how we store memories, and the process we call intuition. Of course there are still unanswered questions such as what consciousness is and how are we able to learn things with just a few examples, but I was shocked how much is known.</p>
<p>I will be going through some of the amazing things that are happening in that brain of yours through this series. This first part is devoted to the “<strong>High Level</strong>” stuff, which involves the more complicated behavior such as how different parts of your brain contribute to your conscious experience, how you are able to solve seemly difficult problems with ease, and how this intuition of yours probably contributes to many of the logical errors you make. Let’s get right into it!</p>
<h1 id="high-level-stuff">High Level Stuff</h1>
<h2 id="system-1-and-system-2">System 1 and System 2</h2>
<p>In the amazing book “<strong><a href="https://www.amazon.com/Thinking-Fast-Slow-Daniel-Kahneman/dp/0374533555">Thinking Fast and Slow</a></strong>” by Daniel Kahneman, who is a world renowned economist and neuroscientist, the high level ways in which we “think” are examined. In this book, things like decisions, problem solving, emotional states, and why we make errors in our logic, both consciously and subconsciously, are shown. I highly recommend this book, I personally have been enjoying the <a href="https://www.audible.com/pd/Thinking-Fast-and-Slow-Audiobook/B005TKKCWC">Audiobook</a> version!</p>
<p><img src="/i-am-a-nerd/images/neuroscience/systems.png" alt="Overview of the two different Systems that make up your thinking." /></p>
<p><em>Overview of the two different Systems that make up your thinking.</em></p>
<p>Daniel Kahneman introduced the idea of two systems of thought, which he named very creatively System 1 and System 2. System 1 is more associated with intuition or <em>fast thinking</em> that processes things without much mental strain such as looking at a picture of a cat and understanding that what you are currently looking at is in fact a cat. System 1 is also automatic in the sense that you have no control over coming to the realization that a cat is in the picture. System 2, on the other hand, is involved in more deliberate and difficult processing that requires you to put in work for solving some task such as calculating that the multiplication of 18 and 32 is equal to 576. don’t worry, I’ll wait :).</p>
<p><img src="/i-am-a-nerd/images/neuroscience/image_0.jpg" alt="A Cat" /></p>
<p><em>A Cat: Image by <strong><a href="https://pixabay.com/photos/?utm_source=link-attribution&utm_medium=referral&utm_campaign=image&utm_content=984097">Free-Photos</a></strong> from <strong><a href="https://pixabay.com/?utm_source=link-attribution&utm_medium=referral&utm_campaign=image&utm_content=984097">Pixabay</a></strong></em></p>
<p>Yoshua Bengio, who is a pioneer of artificial intelligence and recent receiver of the Turing award along with other amazing scientists in the field, put it quite well on how these two systems work together. System 1 is great at generating representations of things and associating them with high level objects such as cats, words, and concepts. These representations are then exploited by System 2 to avoid all of the nitty gritty details of what it means for a cat to be a cat. It instead uses these concepts to do interesting things such as finding relationships between multiple objects or performing complex calculations by planning out a series of operations to perform, i.e. generating and following an algorithm for multiplying 2 numbers.</p>
<p>Daniel Kahneman also mentions many fun experiments to show how System 1 is what runs most of your day to day experiences and how this leads to a lot of logical errors. One famous experiment is the bat and ball experiment. Try and solve the following problem:</p>
<blockquote>
<p>The cost of a bat and ball come to a total of 1 dollar and 10 cents. If the bat costs 1 dollar more than the ball, how much does the ball cost?</p>
</blockquote>
<p>If you guessed the ball costs 10 cents like I did, you were wrong and you were wrong because your System 2 is very, very lazy. A simple calculation will show that if the ball cost 10 cents and the bat is 1 dollar more, the bat would be 1 dollar and 10 cents bring the total up to 1 dollar and 20 cents.</p>
<p>The above example and many more show a common theme in how our minds work. System 1 is at fault here, it recognized some similarity in the problem and automatically offered up what seemed like a reasonable answer to the problem. However, it never even tried to check if the answer was correct. This checking is done by System 2, but System 1 was so sure of itself that System 2 did not even bother to check its answer because it takes work to verify. System 2 is involved in slow and deliberate thinking such as performing calculations or having to consider multiple pieces of information when making a decision.</p>
<p>One interesting effect that occurs when people are actively using their System 2 to solve a task is that it loosen their inhibitions. The study described in “<strong>Thinking Fast and Slow</strong>” showed that people who were asked during an experiment whether they would like a sweet treat such as a slice of chocolate cake or a salad were more likely to choose the chocolate cake if they were also given the task of keeping 7 digits in their mind for a few minutes. It has also been shown that performing System 2 style tasks has an impact on the way people behave such as with increased selfishness and even increased use of sexist language. This is all due to System 2 style tasks requiring focus and attention on the task at hand allowing for System 1, who is known to make quick judgement calls without much thought, to take over any additional tasks you are not focusing on.</p>
<p>Our daily lives and thought process circulate around when to use System 2 for tasks that System 1 has no way of generating a <em>good</em> answer for. You may ask why we do not use System 2 for more tasks, but as discussed previously System 2 is <em>slow</em>. It would be quite dangerous if we relied on System 2 to solve how to avoid crashing into a vehicle on the highway that unexpectedly started merging into your lane without a turn signal or to duck to avoid being hit by a fast ball that’s outside of the batter’s box. An interesting feature, however, of System 2 style tasks is that if they are seen often enough, they can be upgraded to System 1 style tasks by having System 1 learn to recognize the expected answer. This is a common occurrence for driving around an <em>unfamiliar</em> path. The first few times you drive your System 2 is more attentive to make sure you get to where you are going without getting lost. Eventually, if you take the same path, it becomes <em>familiar</em> and your System 1 is able to take over allowing you to reach your destination without much attention being paid to the path you are taking.</p>
<h2 id="brain-structure">Brain Structure</h2>
<p>Now, your brain does not actually have these System 1 and System 2 structures physically, they are just a great way of discussing the way in which your brain works. However, the real way your brain is structured is a lot more messy, but still beautiful :). I found the explanation and structure of how your brain is wired in “<strong><a href="https://grey.colorado.edu/mediawiki/sites/CompCogNeuro/images/0/0e/ccnbook_08_2016.pdf">Computational Cognitive Neuroscience</a></strong>” by Munakata et. al. to be a great resource for learning and so I will be using it as my main source for the rest of this article. While it is a lot denser than “<strong>Thinking Fast and Slow,</strong>” it has great visualizations and goes quite in depth.</p>
<p><img src="/i-am-a-nerd/images/neuroscience/image_1.png" alt="Different lobes of the human brain." /></p>
<p><em>Image by <strong><a href="https://pixabay.com/users/ArtsyBee-462611/?utm_source=link-attribution&utm_medium=referral&utm_campaign=image&utm_content=1007686">Oberholster Venita</a></strong> from <strong><a href="https://pixabay.com/?utm_source=link-attribution&utm_medium=referral&utm_campaign=image&utm_content=1007686">Pixabay</a></strong> (Modified to have labels)</em></p>
<p>Your brain can be organized into two main parts, the Neocortex, which is what most people think of when they imagine a brain, and the Cerebellum, which is not as well known by most but is believed to play a big part in how we think. The Neocortex is the coloured part in the above image and it can be roughly broken down into 4 main <em>lobes</em>, but each lobe is heavily dependent on the others, so don’t think of them as truly distinct sections. You can see the responsibilities of each lobe in the table below.</p>
<p><img src="/i-am-a-nerd/images/neuroscience/image_3.png" alt="Overall responsibilities of the different lobes of the brain." /></p>
<p><em>Overall responsibilities of the different lobes of the brain.</em></p>
<p>One of the most interesting parts of your brain structure is how representations of your senses like sight and sound are built up in a hierarchical fashion as your brain moves the information across the different lobes. Take your ability to easily understand what your eyes are currently seeing. This information is first processed by the Occipital lobe hierarchically at the very back of the brain, starting with identifying simple edges and then moving on to groups of edges to form basic shapes. This representation grows in complexity as the information is sent towards the other parts of the brain like the Temporal lobe where these representations are given semantic meaning in the form of words such as Cat or Human or even more specific such as Garfield the Cat. If we track the representation that your brain is building of what you are seeing into the Parietal lobe, you will find your brain generating representations for relations between the different objects in the scene such as the Cat is hanging from a tree (hang in there buddy!). Lastly, the Frontal lobe takes all these high level representations to perform any number of high level decisions and motor control movement such as petting the kitty :).</p>
<p>As you can see, even this very simple example of just processing what your eyes see involves multiple parts of your brain, even if a lot of the initial work is done by the Occipital lobe. Each lobe does its part in helping generate an understanding of the sensory input you are receiving. This is not just limited to vision, but could be any other type of senses such as smell, touch, and hearing. It’s all connected…</p>
<p><img src="/i-am-a-nerd/images/neuroscience/image_2.png" alt="Pepe Silvia Meme from It’s Always Sunny in Philadelphia. It's all connected" /></p>
<p><em><a href="https://knowyourmeme.com/memes/pepe-silvia">Pepe Silvia</a> from It’s Always Sunny in Philadelphia</em></p>
<h1 id="conclusions">Conclusions</h1>
<p>Well, sadly that is all we are going to spend on the high level stuff of the brain. There are tons of additional stuff I could discuss about the high level part of your brain such as how the Cerebellum actually has half of all the neurons in your entire brain! And how much of its functions are not well understood because it seems to have its hands in processing everything! I could also discuss how there are different types of ways different parts of your brain represent input such as clusterization, hashing, and composition of different representations into new representations. However, there is a lot and we have already covered a lot of the super cool stuff! I hope you have enjoyed this first part of the Neuroscience series and have come away with a better or at least more confusing :) sense of how your brain works!</p>
<p>I will be working to get the next <em>two parts</em> (<strong>Mid Level</strong> and <strong>Low Level</strong>) out soon as there is not much else for me to do during this COVID-19 quarantine (don’t tell my Ph.D. advisor I said that). If you have any questions or have any comments about any interesting stuff you know about the brain please comment down below, I’d love to hear it!</p>If you are like me, you may have thought there was quite a lot of mystery going on in your brain that the scientific community has yet to really figure out and understand. However, through my month long study I found that neuroscience has some powerful computational models and experiments that are able to explain many of the processes going on in the brain such as the visual system, how we store memories, and the process we call intuition. Of course there are still unanswered questions such as what consciousness is and how are we able to learn things with just a few examples, but I was shocked how much is known.How to Create an Automatic Code Comment Generator using Deep Learning!2020-03-07T00:00:00-06:002020-03-07T00:00:00-06:00https://nathancooper.io/i-am-a-nerd/deep_learning/software_engineering/2020/03/07/How_to_Create_an_Automatic_Code_Comment_Generator_using_Deep_Learning<!--
#################################################
### THIS FILE WAS AUTOGENERATED! DO NOT EDIT! ###
#################################################
# file to edit: _notebooks/2020-03-07-How_to_Create_an_Automatic_Code_Comment_Generator_using_Deep_Learning.ipynb
-->
<div class="container" id="notebook-container">
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h1 id="About">About<a class="anchor-link" href="#About"> </a></h1>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>In this post, you will be create a Deep Learning model that given a piece of code, will automatically generate a comment describing (hopefully 🤞) what the piece of code does. This post will focus on Java code, however, the same approach should be able to be applied to other programming languages such as Python or Javascript.</p>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h1 id="Collecting,-preparing-and-exploring-the-data">Collecting, preparing and exploring the data<a class="anchor-link" href="#Collecting,-preparing-and-exploring-the-data"> </a></h1><p>You will be using the <a href="https://github.blog/2019-09-26-introducing-the-codesearchnet-challenge/">CodeSearchNet Challenge</a> dataset from GitHub as it provides a large collection of clean code in multiple different languages. They have a really nice <a href="https://github.com/github/CodeSearchNet/blob/master/notebooks/ExploreData.ipynb">example</a> on how to download and read in the data in their repo that you'll use to get started.</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="o">!</span> wget https://s3.amazonaws.com/code-search-net/CodeSearchNet/v2/java.zip
<span class="o">!</span> unzip java.zip
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_subarea output_stream output_stdout output_text">
<pre>--2020-03-07 16:37:37-- https://s3.amazonaws.com/code-search-net/CodeSearchNet/v2/java.zip
Resolving s3.amazonaws.com (s3.amazonaws.com)... 52.216.179.149
Connecting to s3.amazonaws.com (s3.amazonaws.com)|52.216.179.149|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1060569153 (1011M) [application/zip]
Saving to: ‘java.zip’
java.zip 100%[===================>] 1011M 88.0MB/s in 12s
2020-03-07 16:37:49 (84.9 MB/s) - ‘java.zip’ saved [1060569153/1060569153]
Archive: java.zip
creating: java/
creating: java/final/
creating: java/final/jsonl/
creating: java/final/jsonl/train/
inflating: java/final/jsonl/train/java_train_12.jsonl.gz
inflating: java/final/jsonl/train/java_train_9.jsonl.gz
inflating: java/final/jsonl/train/java_train_3.jsonl.gz
inflating: java/final/jsonl/train/java_train_5.jsonl.gz
inflating: java/final/jsonl/train/java_train_7.jsonl.gz
inflating: java/final/jsonl/train/java_train_1.jsonl.gz
inflating: java/final/jsonl/train/java_train_10.jsonl.gz
inflating: java/final/jsonl/train/java_train_14.jsonl.gz
inflating: java/final/jsonl/train/java_train_0.jsonl.gz
inflating: java/final/jsonl/train/java_train_6.jsonl.gz
inflating: java/final/jsonl/train/java_train_8.jsonl.gz
inflating: java/final/jsonl/train/java_train_15.jsonl.gz
inflating: java/final/jsonl/train/java_train_2.jsonl.gz
inflating: java/final/jsonl/train/java_train_4.jsonl.gz
inflating: java/final/jsonl/train/java_train_13.jsonl.gz
inflating: java/final/jsonl/train/java_train_11.jsonl.gz
creating: java/final/jsonl/test/
inflating: java/final/jsonl/test/java_test_0.jsonl.gz
creating: java/final/jsonl/valid/
inflating: java/final/jsonl/valid/java_valid_0.jsonl.gz
inflating: java_dedupe_definitions_v2.pkl
inflating: java_licenses.pkl
</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p><code>jsonl_list_to_dataframe</code> method is directly from the CodeSearchNet Challenge example code and <code>get_dfs</code> is just a helper for you to properly grab the data into the correct training, validation, and testing splits. Let's see what your data looks like :D!</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<details class="description" open="">
<summary class="btn btn-sm" data-open="Hide Code" data-close="Show Code"></summary>
<p><div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="k">def</span> <span class="nf">jsonl_list_to_dataframe</span><span class="p">(</span><span class="n">file_list</span><span class="p">,</span> <span class="n">columns</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="sd">"""Load a list of jsonl.gz files into a pandas DataFrame."""</span>
<span class="k">return</span> <span class="n">pd</span><span class="o">.</span><span class="n">concat</span><span class="p">([</span><span class="n">pd</span><span class="o">.</span><span class="n">read_json</span><span class="p">(</span><span class="n">f</span><span class="p">,</span>
<span class="n">orient</span><span class="o">=</span><span class="s1">'records'</span><span class="p">,</span>
<span class="n">compression</span><span class="o">=</span><span class="s1">'gzip'</span><span class="p">,</span>
<span class="n">lines</span><span class="o">=</span><span class="kc">True</span><span class="p">)[</span><span class="n">columns</span><span class="p">]</span>
<span class="k">for</span> <span class="n">f</span> <span class="ow">in</span> <span class="n">file_list</span><span class="p">],</span> <span class="n">sort</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">get_dfs</span><span class="p">(</span><span class="n">path</span><span class="p">):</span>
<span class="sd">"""Grabs the different data splits and converts them into dataframes"""</span>
<span class="n">dfs</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">split</span> <span class="ow">in</span> <span class="p">[</span><span class="s2">"train"</span><span class="p">,</span> <span class="s2">"valid"</span><span class="p">,</span> <span class="s2">"test"</span><span class="p">]:</span>
<span class="n">files</span> <span class="o">=</span> <span class="nb">sorted</span><span class="p">((</span><span class="n">path</span><span class="o">/</span><span class="n">split</span><span class="p">)</span><span class="o">.</span><span class="n">glob</span><span class="p">(</span><span class="s2">"**/*.gz"</span><span class="p">))</span>
<span class="n">df</span> <span class="o">=</span> <span class="n">jsonl_list_to_dataframe</span><span class="p">(</span><span class="n">files</span><span class="p">,</span> <span class="p">[</span><span class="s2">"code"</span><span class="p">,</span> <span class="s2">"docstring"</span><span class="p">])</span>
<span class="n">dfs</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">df</span><span class="p">)</span>
<span class="k">return</span> <span class="n">dfs</span>
<span class="n">df_trn</span><span class="p">,</span> <span class="n">df_val</span><span class="p">,</span> <span class="n">df_tst</span> <span class="o">=</span> <span class="n">get_dfs</span><span class="p">(</span><span class="n">path</span><span class="o">/</span><span class="s2">"java/final/jsonl"</span><span class="p">)</span>
<span class="n">df_trn</span><span class="o">.</span><span class="n">head</span><span class="p">()</span>
</pre></div>
</div>
</div>
</div>
</p>
</details>
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_html rendered_html output_subarea output_execute_result">
<div>
<style scoped="">
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>code</th>
<th>docstring</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>protected final void bindIndexed(Configuration...</td>
<td>Bind indexed elements to the supplied collecti...</td>
</tr>
<tr>
<th>1</th>
<td>public void setServletRegistrationBeans(\n\t\t...</td>
<td>Set {@link ServletRegistrationBean}s that the ...</td>
</tr>
<tr>
<th>2</th>
<td>public void addServletRegistrationBeans(\n\t\t...</td>
<td>Add {@link ServletRegistrationBean}s for the f...</td>
</tr>
<tr>
<th>3</th>
<td>public void setServletNames(Collection<String>...</td>
<td>Set servlet names that the filter will be regi...</td>
</tr>
<tr>
<th>4</th>
<td>public void addServletNames(String... servletN...</td>
<td>Add servlet names for the filter.\n@param serv...</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>You are going to only use a small subset of the data in order to train your model in a reasonable time. However, if you want to adjust the amount of data used you can just adjust the sample size.</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">sample</span> <span class="o">=</span> <span class="mf">0.2</span>
<span class="n">df_trn</span> <span class="o">=</span> <span class="n">df_trn</span><span class="o">.</span><span class="n">sample</span><span class="p">(</span><span class="n">frac</span> <span class="o">=</span> <span class="n">sample</span><span class="p">)</span>
<span class="n">df_val</span> <span class="o">=</span> <span class="n">df_val</span><span class="o">.</span><span class="n">sample</span><span class="p">(</span><span class="n">frac</span> <span class="o">=</span> <span class="n">sample</span><span class="p">)</span>
<span class="n">df_tst</span> <span class="o">=</span> <span class="n">df_tst</span><span class="o">.</span><span class="n">sample</span><span class="p">(</span><span class="n">frac</span> <span class="o">=</span> <span class="n">sample</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>Awesome! Now that you have the data, there's a few other preprocessing steps you need to perform. First we are going to remove any non-english comments. Next, you will also remove the JavaDocs, i.e., any line with an @ symbol or curly braces, as that will significantlly lessen the amount of learning your model will have to do. This also works out well since the JavaDoc syntax can usually be autogenerated from the method's signature.</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<details class="description" open="">
<summary class="btn btn-sm" data-open="Hide Code" data-close="Show Code"></summary>
<p><div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="c1"># From https://stackoverflow.com/a/27084708/5768407</span>
<span class="k">def</span> <span class="nf">isASCII</span><span class="p">(</span><span class="n">s</span><span class="p">):</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">s</span><span class="o">.</span><span class="n">encode</span><span class="p">(</span><span class="n">encoding</span><span class="o">=</span><span class="s1">'utf-8'</span><span class="p">)</span><span class="o">.</span><span class="n">decode</span><span class="p">(</span><span class="s1">'ascii'</span><span class="p">)</span>
<span class="k">except</span> <span class="ne">UnicodeDecodeError</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">False</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">True</span>
<span class="n">df_trn</span> <span class="o">=</span> <span class="n">df_trn</span><span class="p">[</span><span class="n">df_trn</span><span class="p">[</span><span class="s1">'docstring'</span><span class="p">]</span><span class="o">.</span><span class="n">apply</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">isASCII</span><span class="p">(</span><span class="n">x</span><span class="p">))]</span>
<span class="n">df_val</span> <span class="o">=</span> <span class="n">df_val</span><span class="p">[</span><span class="n">df_val</span><span class="p">[</span><span class="s1">'docstring'</span><span class="p">]</span><span class="o">.</span><span class="n">apply</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">isASCII</span><span class="p">(</span><span class="n">x</span><span class="p">))]</span>
<span class="n">df_tst</span> <span class="o">=</span> <span class="n">df_tst</span><span class="p">[</span><span class="n">df_tst</span><span class="p">[</span><span class="s1">'docstring'</span><span class="p">]</span><span class="o">.</span><span class="n">apply</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">isASCII</span><span class="p">(</span><span class="n">x</span><span class="p">))]</span>
</pre></div>
</div>
</div>
</div>
</p>
</details>
</div>
<div class="cell border-box-sizing code_cell rendered">
<details class="description" open="">
<summary class="btn btn-sm" data-open="Hide Code" data-close="Show Code"></summary>
<p><div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="k">def</span> <span class="nf">filter_jdocs</span><span class="p">(</span><span class="n">df</span><span class="p">):</span>
<span class="n">methods</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">comments</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">progress_bar</span><span class="p">(</span><span class="nb">list</span><span class="p">(</span><span class="n">df</span><span class="o">.</span><span class="n">iterrows</span><span class="p">())):</span>
<span class="n">comment</span> <span class="o">=</span> <span class="n">row</span><span class="p">[</span><span class="s2">"docstring"</span><span class="p">]</span>
<span class="c1"># Remove {} text in comments from https://stackoverflow.com/questions/14596884/remove-text-between-and-in-python/14598135</span>
<span class="n">comment</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">sub</span><span class="p">(</span><span class="s2">"([\{\[]).*?([\)\}])"</span><span class="p">,</span> <span class="s1">''</span><span class="p">,</span> <span class="n">comment</span><span class="p">)</span>
<span class="n">cleaned</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">line</span> <span class="ow">in</span> <span class="n">comment</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">'</span><span class="se">\n</span><span class="s1">'</span><span class="p">):</span>
<span class="k">if</span> <span class="s2">"@"</span> <span class="ow">in</span> <span class="n">line</span><span class="p">:</span> <span class="k">break</span>
<span class="n">cleaned</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">line</span><span class="p">)</span>
<span class="n">comments</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="s1">'</span><span class="se">\n</span><span class="s1">'</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">cleaned</span><span class="p">))</span>
<span class="n">methods</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">row</span><span class="p">[</span><span class="s2">"code"</span><span class="p">])</span>
<span class="n">new_df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">(</span><span class="nb">zip</span><span class="p">(</span><span class="n">methods</span><span class="p">,</span> <span class="n">comments</span><span class="p">),</span> <span class="n">columns</span> <span class="o">=</span> <span class="p">[</span><span class="s2">"code"</span><span class="p">,</span> <span class="s2">"docstring"</span><span class="p">])</span>
<span class="k">return</span> <span class="n">new_df</span>
<span class="n">df_trn</span> <span class="o">=</span> <span class="n">filter_jdocs</span><span class="p">(</span><span class="n">df_trn</span><span class="p">);</span>
<span class="n">df_val</span> <span class="o">=</span> <span class="n">filter_jdocs</span><span class="p">(</span><span class="n">df_val</span><span class="p">);</span>
<span class="n">df_tst</span> <span class="o">=</span> <span class="n">filter_jdocs</span><span class="p">(</span><span class="n">df_tst</span><span class="p">);</span>
</pre></div>
</div>
</div>
</div>
</p>
</details>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>Now you are going to remove any empty comments or duplicate comments for your datasets.</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">df_trn</span> <span class="o">=</span> <span class="n">df_trn</span><span class="p">[</span><span class="o">~</span><span class="p">(</span><span class="n">df_trn</span><span class="p">[</span><span class="s1">'docstring'</span><span class="p">]</span> <span class="o">==</span> <span class="s1">''</span><span class="p">)]</span>
<span class="n">df_val</span> <span class="o">=</span> <span class="n">df_val</span><span class="p">[</span><span class="o">~</span><span class="p">(</span><span class="n">df_val</span><span class="p">[</span><span class="s1">'docstring'</span><span class="p">]</span> <span class="o">==</span> <span class="s1">''</span><span class="p">)]</span>
<span class="n">df_tst</span> <span class="o">=</span> <span class="n">df_tst</span><span class="p">[</span><span class="o">~</span><span class="p">(</span><span class="n">df_tst</span><span class="p">[</span><span class="s1">'docstring'</span><span class="p">]</span> <span class="o">==</span> <span class="s1">''</span><span class="p">)]</span>
</pre></div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">df_trn</span> <span class="o">=</span> <span class="n">df_trn</span><span class="p">[</span><span class="o">~</span><span class="n">df_trn</span><span class="p">[</span><span class="s1">'docstring'</span><span class="p">]</span><span class="o">.</span><span class="n">duplicated</span><span class="p">()]</span>
<span class="n">df_val</span> <span class="o">=</span> <span class="n">df_val</span><span class="p">[</span><span class="o">~</span><span class="n">df_val</span><span class="p">[</span><span class="s1">'docstring'</span><span class="p">]</span><span class="o">.</span><span class="n">duplicated</span><span class="p">()]</span>
<span class="n">df_tst</span> <span class="o">=</span> <span class="n">df_tst</span><span class="p">[</span><span class="o">~</span><span class="n">df_tst</span><span class="p">[</span><span class="s1">'docstring'</span><span class="p">]</span><span class="o">.</span><span class="n">duplicated</span><span class="p">()]</span>
</pre></div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>Not bad, still leaves you with plenty of data to learn with!</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="nb">len</span><span class="p">(</span><span class="n">df_trn</span><span class="p">),</span> <span class="nb">len</span><span class="p">(</span><span class="n">df_val</span><span class="p">),</span> <span class="nb">len</span><span class="p">(</span><span class="n">df_tst</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_text output_subarea output_execute_result">
<pre>(73755, 2427, 4615)</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h2 id="Exploring-your-data!">Exploring your data!<a class="anchor-link" href="#Exploring-your-data!"> </a></h2><p>As a good machine learning practitioner, it is extremely important to be careful with your data. This includes checking for biases, duplicates, and also describing the data that you have. Not doing so is setting you up for disaster. I have personally experienced such travesty when working on one of my own research projects where I forgot to check for duplicates before splitting my data. Sadly for me and all my restless nights working on the project, the data was full of duplicates and so my test set was contaminated with data points from my training set, which lead to inflated evaluation metrics :(.</p>
<p><strong>Always explore your data!</strong></p>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>You'll be doing some basic descriptive statistics for this Exploratory Data Analysis (EDA), which just means calculating some means, medians, and standard deviations for different views of your data. The first view you will be exploring is the tokens that make up your code and comments. To tokenize your data into these tokens you will use something called Byte Pair Encoding, which has shown great results for tokenizing both natural language and code as shown in Karampatsis and Sutton's paper <a href="https://arxiv.org/abs/1903.05734">"Maybe Deep Neural Networks are the Best Choice for Modeling Source Code."</a></p>
<p>A great resource on learning more about how Byte Pair Encoding works is this <a href="https://towardsdatascience.com/byte-pair-encoding-the-dark-horse-of-modern-nlp-eb36c7df4f10">blog post</a> by Akashdeep Singh Jaswal and this <a href="https://youtu.be/9oTHFx0Gg3Q">Youtube video</a> by Christopher Manning. Specifically, you will be using the awesome library by Google called <a href="https://github.com/google/sentencepiece">sentencepiece</a>.</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<details class="description" open="">
<summary class="btn btn-sm" data-open="Hide Code" data-close="Show Code"></summary>
<p><div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="k">def</span> <span class="nf">df_to_txt_file</span><span class="p">(</span><span class="n">df</span><span class="p">,</span> <span class="n">output</span><span class="p">,</span> <span class="n">col</span><span class="p">):</span>
<span class="sd">"""Converts a dataframe and converts it into a text file that SentencePiece can use to train a BPE model"""</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">output</span><span class="o">/</span><span class="s1">'text.txt'</span><span class="p">,</span> <span class="s1">'w'</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
<span class="n">f</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="s1">'</span><span class="se">\n</span><span class="s1">'</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="nb">list</span><span class="p">(</span><span class="n">df</span><span class="p">[</span><span class="n">col</span><span class="p">])))</span>
<span class="k">return</span> <span class="n">output</span><span class="o">/</span><span class="s1">'text.txt'</span>
<span class="k">def</span> <span class="nf">gen_sp_model</span><span class="p">(</span><span class="n">df</span><span class="p">,</span> <span class="n">output</span><span class="p">,</span> <span class="n">tokenizer_name</span><span class="p">,</span> <span class="n">col</span><span class="p">):</span>
<span class="sd">"""Trains a SentencePiece BPE model from a pandas dataframe"""</span>
<span class="n">fname</span> <span class="o">=</span> <span class="n">df_to_txt_file</span><span class="p">(</span><span class="n">df</span><span class="p">,</span> <span class="n">output</span><span class="p">,</span> <span class="n">col</span><span class="p">)</span>
<span class="n">sp</span><span class="o">.</span><span class="n">SentencePieceTrainer</span><span class="o">.</span><span class="n">train</span><span class="p">(</span><span class="sa">f</span><span class="s1">'--input=</span><span class="si">{</span><span class="n">fname</span><span class="si">}</span><span class="s1"> --model_prefix=</span><span class="si">{</span><span class="n">output</span> <span class="o">/</span> <span class="n">tokenizer_name</span><span class="si">}</span><span class="s1"> --hard_vocab_limit=false'</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
</p>
</details>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>To use Byte Pair Encoding, you have to train the tokenizer on your data. However, no need to train your BPE on all of your data, so you will just be doing it on a subset (10%) of your training data. You are picking to train the BPE model from your training set to not perform any inadvertant data snooping by biasing your BPE model to tokenize more common words in your validation or testing set. This also will help show that you are indeed solving the out of vocabulary problem because you will most likely encounter words in your testing set that were not in your training set.</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">p_bpe</span> <span class="o">=</span> <span class="mf">0.1</span>
<span class="n">method_tokenizer</span> <span class="o">=</span> <span class="s2">"method_bpe"</span>
<span class="n">gen_sp_model</span><span class="p">(</span><span class="n">df_trn</span><span class="o">.</span><span class="n">sample</span><span class="p">(</span><span class="n">frac</span> <span class="o">=</span> <span class="n">p_bpe</span><span class="p">),</span> <span class="n">path</span><span class="p">,</span> <span class="n">method_tokenizer</span><span class="p">,</span> <span class="n">col</span> <span class="o">=</span> <span class="s2">"code"</span><span class="p">)</span>
<span class="n">comment_tokenizer</span> <span class="o">=</span> <span class="s2">"comment_bpe"</span>
<span class="n">gen_sp_model</span><span class="p">(</span><span class="n">df_trn</span><span class="o">.</span><span class="n">sample</span><span class="p">(</span><span class="n">frac</span> <span class="o">=</span> <span class="n">p_bpe</span><span class="p">),</span> <span class="n">path</span><span class="p">,</span> <span class="n">comment_tokenizer</span><span class="p">,</span> <span class="n">col</span> <span class="o">=</span> <span class="s2">"docstring"</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>Now that you have the ability to tokenize your text, let us explore! First you will just generate the frequency of each of your tokens and while you are at it, let's collect how long your methods are by via the common software metric Lines of Code (LOC).</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<details class="description" open="">
<summary class="btn btn-sm" data-open="Hide Code" data-close="Show Code"></summary>
<p><div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="k">def</span> <span class="nf">get_counter_and_lens</span><span class="p">(</span><span class="n">df</span><span class="p">,</span> <span class="n">spm</span><span class="p">,</span> <span class="n">col</span><span class="p">):</span>
<span class="n">toks</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">locs</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">progress_bar</span><span class="p">(</span><span class="nb">list</span><span class="p">(</span><span class="n">df</span><span class="o">.</span><span class="n">iterrows</span><span class="p">())):</span>
<span class="n">toks</span><span class="o">.</span><span class="n">extend</span><span class="p">(</span><span class="n">spm</span><span class="o">.</span><span class="n">EncodeAsPieces</span><span class="p">(</span><span class="n">row</span><span class="p">[</span><span class="n">col</span><span class="p">]))</span>
<span class="n">locs</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">row</span><span class="p">[</span><span class="n">col</span><span class="p">]</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">'</span><span class="se">\n</span><span class="s1">'</span><span class="p">)))</span>
<span class="n">cnt</span> <span class="o">=</span> <span class="n">Counter</span><span class="p">()</span>
<span class="k">for</span> <span class="n">tok</span> <span class="ow">in</span> <span class="n">progress_bar</span><span class="p">(</span><span class="n">toks</span><span class="p">):</span>
<span class="n">cnt</span><span class="p">[</span><span class="n">tok</span><span class="p">]</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="k">return</span> <span class="nb">list</span><span class="p">(</span><span class="nb">map</span><span class="p">(</span><span class="nb">len</span><span class="p">,</span> <span class="n">toks</span><span class="p">)),</span> <span class="n">cnt</span><span class="p">,</span> <span class="n">locs</span>
<span class="n">code_lens</span><span class="p">,</span> <span class="n">code_cnt</span><span class="p">,</span> <span class="n">locs</span> <span class="o">=</span> <span class="n">get_counter_and_lens</span><span class="p">(</span><span class="n">df_trn</span><span class="p">,</span> <span class="n">method_spm</span><span class="p">,</span> <span class="s1">'code'</span><span class="p">)</span>
<span class="n">comment_lens</span><span class="p">,</span> <span class="n">comment_cnt</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="n">get_counter_and_lens</span><span class="p">(</span><span class="n">df_trn</span><span class="p">,</span> <span class="n">method_spm</span><span class="p">,</span> <span class="s1">'docstring'</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
</p>
</details>
</div>
<div class="cell border-box-sizing code_cell rendered">
<details class="description" open="">
<summary class="btn btn-sm" data-open="Hide Code" data-close="Show Code"></summary>
<p><div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="k">def</span> <span class="nf">plot_counts</span><span class="p">(</span><span class="n">counts</span><span class="p">,</span> <span class="n">top_k</span> <span class="o">=</span> <span class="mi">30</span><span class="p">):</span>
<span class="n">labels</span><span class="p">,</span> <span class="n">values</span> <span class="o">=</span> <span class="nb">zip</span><span class="p">(</span><span class="o">*</span><span class="n">counts</span><span class="o">.</span><span class="n">most_common</span><span class="p">()[:</span><span class="n">top_k</span><span class="p">])</span>
<span class="n">indexes</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">arange</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">labels</span><span class="p">))</span>
<span class="n">width</span> <span class="o">=</span> <span class="mi">1</span>
<span class="n">plt</span><span class="o">.</span><span class="n">figure</span><span class="p">(</span><span class="n">num</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">22</span><span class="p">,</span> <span class="mi">4</span><span class="p">),</span> <span class="n">dpi</span><span class="o">=</span><span class="mi">60</span><span class="p">,</span> <span class="n">facecolor</span><span class="o">=</span><span class="s1">'w'</span><span class="p">,</span> <span class="n">edgecolor</span><span class="o">=</span><span class="s1">'k'</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">bar</span><span class="p">(</span><span class="n">indexes</span><span class="p">,</span> <span class="n">values</span><span class="p">,</span> <span class="n">width</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">xticks</span><span class="p">(</span><span class="n">indexes</span> <span class="o">+</span> <span class="n">width</span> <span class="o">*</span> <span class="mf">0.5</span><span class="p">,</span> <span class="n">labels</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
<span class="n">plot_counts</span><span class="p">(</span><span class="n">code_cnt</span><span class="p">,</span> <span class="n">top_k</span> <span class="o">=</span> <span class="mi">30</span><span class="p">)</span>
<span class="n">plot_counts</span><span class="p">(</span><span class="n">comment_cnt</span><span class="p">,</span> <span class="n">top_k</span> <span class="o">=</span> <span class="mi">30</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
</p>
</details>
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_png output_subarea ">
<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAABDAAAADQCAYAAADxn5GHAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz
AAAJOgAACToB8GSSSgAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0
dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAgAElEQVR4nO3dfXBU9b3H8U8SEgOXydONg3iTQKvR
UKdYBSzssmEJEGMAr6BZsAKm1taits110CjlghBokY4MjuLTADdwBdpdSmk1mkDANQ9LRVLAUkGB
FptEsdA8IRdMNtn7B7PbAEnkIbvnJLxfM5mBsye/7/dsds/ufvZ3zgnz+Xw+AQAAAAAAmFi40Q0A
AAAAAAB8HQIMAAAAAABgegQYAAAAAADA9AgwAAAAAACA6RFgAAAAAAAA0+tjdAOXYtiwYbrhhhuM
bgMAAAAAAATJkSNHVFVVdcHyHhVg3HDDDXI6nUa3AQAAAAAAgsThcHS4nENIAAAAAACA6RFgAAAA
AAAA0yPAAAAAAAAApkeAAQAAAAAATI8AAwAAAAAAmB4BBgAAAAAAMD0CDAAAAAAAYHp9jG7gajH4
6SLDah9dOtGw2gAAAAAAdAdmYAAAAAAAANMjwAAAAAAAAKZHgAEAAAAAAEyPAAMAAAAAAJje1wYY
brdb48aN09ixY/W73/1OFRUVslgsGj16tP785z9Lko4dO6bMzExZrVa98cYbkqTW1lY99NBDstls
ysvLC4z3wgsvyGq16u6771ZTU5MkdTgmAAAAAACAX5cBxunTp/X888/rnXfe0bvvvqspU6bo5z//
uYqKirRhwwbl5+dLkp577jk99dRTeu+997Ry5UqdOXNGb731lq6//nqVl5fr1KlT2rlzp06cOKE/
/OEPqqio0LRp07Ry5UpJ6nBMAAAAAAAAvy4DjJ07d6pv376aPHmypkyZos8//1wRERGKj49XSkqK
6urqJEm7du1SRkaG+vTpo+HDh2v//v3yeDzKzMyUJGVlZamyslIffPCBxowZo7CwsMCy06dPdzgm
AAAAAACAX5+ubvziiy90+PBh/fGPf1RpaakWLFigmJiYf/1ynz5qbm5WS0uLwsPPZiGxsbGqq6tT
fX19YN2LXdZ+zKioqG7fWAAAAAAA0DN1OQMjLi5OVqtVUVFRGjdunPbs2RM4b4Ukeb1eRUVFKTIy
Um1tbZKkxsZGJSQkKC4uLrDuxS5rP6afy+WSw+GQw+FQdXV19205AAAAAADoMboMMEaMGKEDBw7I
5/Np7969+ta3viWv16uGhgZVV1crISEhsJ7b7ZbX61VVVZVuueUWWSwWlZaWSpJKSkpktVo1YsQI
lZWVnbOsX79+HY7pl5OTI6fTKafTqeTk5GDcBwAAAAAAwOS6PIQkMTFRU6ZMCZy3Ys2aNaqtrVV2
drbCwsL08ssvS5Ly8/M1a9YszZs3Tz/+8Y/Vt29fTZo0SVu2bJHNZtNtt92mUaNGSZImTpwoq9Wq
+Ph4rV+/XpK0ePHiC8YEAAAAAADwC/P5fD6jm7hYDodDTqfT6DYuy+CniwyrfXTpRMNqAwAAAABw
KTr77N/lISQAAAAAAABmQIABAAAAAABMjwADAAAAAACYHgEGAAAAAAAwPQIMAAAAAABgegQYAAAA
AADA9AgwAAAAAACA6RFgAAAAAAAA0yPAAAAAAAAApkeAAQAAAAAATI8AAwAAAAAAmB4BBgAAAAAA
MD0CDAAAAAAAYHoEGAAAAAAAwPQIMAAAAAAAgOkRYAAAAAAAANMjwAAAAAAAAKZHgAEAAAAAAEyP
AAMAAAAAAJhelwHG0aNHde2118put8tut+v48eNyuVyyWCwaN26campqJEkHDx5Uenq6LBaLtm/f
Lkk6deqUpk6dqtGjR2vZsmWBMfPz82Wz2TRz5ky1tLRIUodjAgAAAAAA+H3tDIwxY8bI7XbL7XYr
Pj5ey5cvl9vt1qJFi1RQUCBJmjt3rlavXq3i4mLNnz9fkrRq1SplZ2eroqJCO3bsUG1trfbt26fa
2lqVl5crLS1NmzZtktfr7XBMAAAAAAAAv68NMCorK2Wz2TR37lwdOnRIQ4YMUVRUlKxWqz788ENJ
0meffabU1FTFxMQoISFBJ06ckMfjUWZmpiRpwoQJ2rlz5znLsrKyVFlZ2emYAAAAAAAAfl0GGAMH
DtThw4dVVlamf/zjH9q8ebNiYmICt7e2tkqS2traAstiY2NVV1en+vr6wLoXu6z9mAAAAAAAAH59
urrxmmuu0TXXXCNJmjp1qgoLC9W/f//A7REREZKk8PB/5SCNjY1KSEhQXFycmpqaFBcXp8bGRg0a
NEher1dNTU0drnf+mH4ul0sul0uSVF1dfSXbCgAAAAAAeqguZ2CcPHky8O/y8nJNnDhRBw4cUHNz
szwej4YOHSrp7EyNI0eO6OTJk6qrq1NiYqIsFotKS0slSaWlpRo5cuQ5y0pKSmS1WpWamtrhmH45
OTlyOp1yOp1KTk7u1o0HAAAAAAA9Q5czMCoqKjRv3jz169dP3/jGN1RQUKDo6GjZ7XZFR0dr7dq1
kqQlS5YoNzdXra2tWrhwoSTp4Ycf1owZM7RmzRpNmjRJSUlJSkpK0oABA2Sz2ZSSkqI5c+YoMjJS
eXl5F4wJAAAAAADgF+bz+XxGN3GxHA6HnE6n0W1clsFPFxlW++jSiYbVBgAAAADgUnT22f9rr0IC
AAAAAABgNAIMAAAAAABgegQYAAAAAADA9AgwAAAAAACA6RFgAAAAAAAA0yPAAAAAAAAApkeAAQAA
AAAATI8AAwAAAAAAmB4BBgAAAAAAMD0CDAAAAAAAYHoEGAAAAAAAwPQIMAAAAAAAgOkRYAAAAAAA
ANMjwAAAAAAAAKZHgAEAAAAAAEyPAAMAAAAAAJgeAQYAAAAAADA9AgwAAAAAAGB6fYxuAME3+Oki
w2ofXTrRsNoAAAAAgN6DGRgAAAAAAMD0LirA2Lhxo6699lpJksvlksVi0bhx41RTUyNJOnjwoNLT
02WxWLR9+3ZJ0qlTpzR16lSNHj1ay5YtC4yVn58vm82mmTNnqqWlpdMxAQAAAAAA/L42wGhtbZXL
5VJycrK8Xq+WL18ut9utRYsWqaCgQJI0d+5crV69WsXFxZo/f74kadWqVcrOzlZFRYV27Nih2tpa
7du3T7W1tSovL1daWpo2bdrU6ZgAAAAAAAB+XxtgbNy4UTk5OQoPD9ehQ4c0ZMgQRUVFyWq16sMP
P5QkffbZZ0pNTVVMTIwSEhJ04sQJeTweZWZmSpImTJignTt3nrMsKytLlZWVnY4JAAAAAADg12WA
0draKqfTqWnTpkmS6uvrFRMTc87tktTW1hZYFhsbq7q6unPWvdhl7cf0c7lccjgccjgcqq6uvpJt
BQAAAAAAPVSXVyF544035HA4FB5+NueIi4tTU1NT4PaIiAhJCtwuSY2NjUpISAisGxcXp8bGRg0a
NEherzfw++evd/6Yfjk5OcrJyZEkORyOK9lWAAAAAADQQ3U5A+Ojjz7SunXrlJWVpUOHDunFF1/U
gQMH1NzcLI/Ho6FDh0qSBg4cqCNHjujkyZOqq6tTYmKiLBaLSktLJUmlpaUaOXLkOctKSkpktVqV
mpra4ZgAAAAAAAB+Xc7AeO655wL/Hj58uF555RX95je/kd1uV3R0tNauXStJWrJkiXJzc9Xa2qqF
CxdKkh5++GHNmDFDa9as0aRJk5SUlKSkpCQNGDBANptNKSkpmjNnjiIjI5WXl3fBmAAAAAAAAH5h
Pp/PZ3QTF8vhcMjpdBrdxmUZ/HSR0S0Y4ujSiUa3AAAAAADoQTr77P+1VyEBAAAAAAAwGgEGAAAA
AAAwPQIMAAAAAABgegQYAAAAAADA9AgwAAAAAACA6RFgAAAAAAAA0yPAAAAAAAAApkeAAQAAAAAA
TK+P0Q2gdxv8dJFhtY8unWhYbQAAAABA92IGBgAAAAAAMD0CDAAAAAAAYHoEGAAAAAAAwPQIMAAA
AAAAgOkRYAAAAAAAANMjwAAAAAAAAKZHgAEAAAAAAEyPAAMAAAAAAJgeAQYAAAAAADA9AgwAAAAA
AGB6XQYYX3zxhSwWi8aMGaOMjAx9/vnnqqiokMVi0ejRo/XnP/9ZknTs2DFlZmbKarXqjTfekCS1
trbqoYceks1mU15eXmDMF154QVarVXfffbeampokqcMxAQAAAAAA/LoMMBITE1VRUaH33ntPs2bN
0urVq/Xzn/9cRUVF2rBhg/Lz8yVJzz33nJ566im99957Wrlypc6cOaO33npL119/vcrLy3Xq1Cnt
3LlTJ06c0B/+8AdVVFRo2rRpWrlypSR1OCYAAAAAAIBflwFGRESEwsPPrnLy5EndcMMNioiIUHx8
vFJSUlRXVydJ2rVrlzIyMtSnTx8NHz5c+/fvl8fjUWZmpiQpKytLlZWV+uCDDzRmzBiFhYUFlp0+
fbrDMQEAAAAAAPz6fN0Ke/fu1SOPPKKGhgZt3bpVv/nNb/71y336qLm5WS0tLYGgIzY2VnV1daqv
r1dMTMwlLWs/ZlRUlCTJ5XLJ5XJJkqqrq7tpswEAAAAAQE/ytQHGd77zHb3//vtyOp1asmRJ4LwV
kuT1ehUVFaXIyEi1tbUpPDxcjY2NSkhIUFxcXGDd9ssOHz58wbKOxvTLyclRTk6OJMnhcHTPVgMA
AAAAgB6ly0NImpubA/+OjY1V//795fV61dDQoOrqaiUkJEiSRowYIbfbLa/Xq6qqKt1yyy2yWCwq
LS2VJJWUlMhqtWrEiBEqKys7Z1m/fv06HBMAAAAAAMCvyxkYe/fu1Zw5cxQREaHo6GitWbNGhw4d
UnZ2tsLCwvTyyy9LkvLz8zVr1izNmzdPP/7xj9W3b19NmjRJW7Zskc1m02233aZRo0ZJkiZOnCir
1ar4+HitX79ekrR48eILxgQAAAAAAPAL8/l8PqObuFgOh0NOp9PoNi7L4KeLjG7hqnN06USjWwAA
AAAAXKLOPvt3eQgJAAAAAACAGRBgAAAAAAAA0yPAAAAAAAAApkeAAQAAAAAATK/Lq5AAPZmRJ07l
BKIAAAAA0L0IMIAguFqvOkNwAwAAACBYOIQEAAAAAACYHgEGAAAAAAAwPQIMAAAAAABgegQYAAAA
AADA9AgwAAAAAACA6RFgAAAAAAAA0yPAAAAAAAAApkeAAQAAAAAATI8AAwAAAAAAmF4foxsA0HsM
frrI6BYMcXTpRKNbAAAAAHo9ZmAAAAAAAADTI8AAAAAAAACmR4ABAAAAAABMr8sAY9euXRo1apTS
09N1//33q6WlRS6XSxaLRePGjVNNTY0k6eDBg0pPT5fFYtH27dslSadOndLUqVM1evRoLVu2LDBm
fn6+bDabZs6cqZaWFknqcEwAAAAAAAC/LgOM5ORk7dixQ2VlZRo8eLB+//vfa/ny5XK73Vq0aJEK
CgokSXPnztXq1atVXFys+fPnS5JWrVql7OxsVVRUaMeOHaqtrdW+fftUW1ur8vJypaWladOmTfJ6
vR2OCQAAAAAA4NdlgDFw4ED17dtXkhQVFaWPP/5YQ4YMUVRUlKxWqz788ENJ0meffabU1FTFxMQo
ISFBJ06ckMfjUWZmpiRpwoQJ2rlz5znLsrKyVFlZqUOHDnU4JgAAAAAAgN9FXUb1008/1datW7V0
6VIdP348sLy1tVWS1NbWFlgWGxururo61dfXKyYm5oJlAwcO7HS99mMCQE9h5OVjuYQrAAAArhZf
G2A0NTVp5syZKiwsVGtrq5qamgK3RURESJLCw/81kaOxsVEJCQmKi4tTU1OT4uLi1NjYqEGDBsnr
9QZ+//z1zh/Tz+VyyeVySZKqq6uvYFMBAAAAAEBP1eUhJF6vV9OnT9eCBQt08803KzU1VQcOHFBz
c7M8Ho+GDh0q6eyhJkeOHNHJkydVV1enxMREWSwWlZaWSpJKS0s1cuTIc5aVlJTIarV2OqZfTk6O
nE6nnE6nkpOTg3EfAAAAAAAAk+tyBsbGjRv1/vvvq6CgQAUFBZo9e7by8vJkt9sVHR2ttWvXSpKW
LFmi3Nxctba2auHChZKkhx9+WDNmzNCaNWs0adIkJSUlKSkpSQMGDJDNZlNKSormzJmjyMjIDscE
AAAAAADwC/P5fD6jm7hYDodDTqfT6DYui5HHyAPovTgHBgAAAHqbzj77X9RJPAEA5sQJRAEAAHC1
6PIcGAAAAAAAAGZAgAEAAAAAAEyPAAMAAAAAAJge58AAAFwWzr8BAACAUCLAAAD0OIQnAAAAVx8O
IQEAAAAAAKZHgAEAAAAAAEyPAAMAAAAAAJgeAQYAAAAAADA9AgwAAAAAAGB6BBgAAAAAAMD0uIwq
AACXgEu4AgAAGIMZGAAAAAAAwPSYgQEAQA/B7A8AAHA1YwYGAAAAAAAwPQIMAAAAAABgegQYAAAA
AADA9AgwAAAAAACA6XUZYDQ2NuqOO+5Q//79tX//fkmSy+WSxWLRuHHjVFNTI0k6ePCg0tPTZbFY
tH37dknSqVOnNHXqVI0ePVrLli0LjJmfny+bzaaZM2eqpaWl0zEBAAAAAAD8ugww+vXrp6KiIt13
332SJK/Xq+XLl8vtdmvRokUqKCiQJM2dO1erV69WcXGx5s+fL0latWqVsrOzVVFRoR07dqi2tlb7
9u1TbW2tysvLlZaWpk2bNnU6JgAAAAAAgF+XAUZkZKSuvfbawP8PHTqkIUOGKCoqSlarVR9++KEk
6bPPPlNqaqpiYmKUkJCgEydOyOPxKDMzU5I0YcIE7dy585xlWVlZqqys7HRMAAAAAAAAv0s6B0Z9
fb1iYmIC/29tbZUktbW1BZbFxsaqrq7unHUvdln7MQEAAAAAAPz6XMrKcXFxampqCvw/IiJCkhQe
/q8cpLGxUQkJCYF14+Li1NjYqEGDBsnr9QZ+//z1zh/Tz+VyyeVySZKqq6svcfMAAAAAAEBvcEkz
MFJTU3XgwAE1NzfL4/Fo6NChkqSBAwfqyJEjOnnypOrq6pSYmCiLxaLS0lJJUmlpqUaOHHnOspKS
Elmt1k7H9MvJyZHT6ZTT6VRycnJ3bDMAAAAAAOhhvnYGRnZ2tvbu3auPP/5YjzzyiPLy8mS32xUd
Ha21a9dKkpYsWaLc3Fy1trZq4cKFkqSHH35YM2bM0Jo1azRp0iQlJSUpKSlJAwYMkM1mU0pKiubM
maPIyMgOxwQAAAAAAPAL8/l8PqObuFgOh0NOp9PoNi7L4KeLjG4BAIAe6ejSiUa3AAAAQqizz/6X
dAgJAAAAAACAES7pJJ4AAAChxizG0GPWCwDAjJiBAQAAAAAATI8ZGAAAADiHkbNemP0BAOgMAQYA
AABMg/AEANAZDiEBAAAAAACmxwwMAAAAQMz+AACzI8AAAAAADMbVdkKP0AjoeQgwAAAAAFx1mHED
9DwEGAAAAAAQQoQnwOUhwAAAAACAq8TVergSwU3vQIABAAAAAOjVCG56By6jCgAAAAAATI8AAwAA
AAAAmB4BBgAAAAAAMD0CDAAAAAAAYHoEGAAAAAAAwPQIMAAAAAAAgOkRYAAAAAAAANMjwAAAAAAA
AKZnmgAjPz9fNptNM2fOVEtLi9HtAAAAAAAAEzFFgLFv3z7V1taqvLxcaWlp2rRpk9EtAQAAAAAA
EzFFgOHxeJSZmSlJysrKUmVlpcEdAQAAAAAAM+ljdAOSVF9fr4EDB0qSYmNjVVdXF7jN5XLJ5XJJ
knbv3i2Hw2FIj1fqDgNrV1dXKzk5mdrUpja1qU1talOb2tSmNrWpfRXVHjVqsWG1r8SRI0c6vsFn
AitXrvStXbvW5/P5fLt37/Y99thjBnfUu+Tk5FCb2tSmNrWpTW1qU5va1KY2tando5niEBKLxaLS
0lJJUklJiaxWq8EdAQAAAAAAM4l49tlnnzW6ieuuu04ej0cFBQVqbm7WM888o4iICKPb6lVuueUW
alOb2tSmNrWpTW1qU5va1KY2tXusMJ/P5zO6CQAAAAAAgK6Y4hASAAAAAACArhBgIKjefvttrV69
2ug2DOF0OnX77bdflZcF9vl8ysrK0vTp041uJeiqq6v1X//1X0a3IUmaPn26vF5v0OuYaZsRWnl5
eTp9+rT+7//+T3a7XePHjw96zddffz3oNS7Fli1b9I9//COkNefNm6d9+/ZJkh588MFzrtbWnfyv
2dOmTRMTdC80fPhwSVJubq72799vcDfBYbfb5Xa7ZYIjzEPK/7dtz/933rt3r1555ZVuq+V/ntnt
dj377LNyu93dNrbRfv3rX2vkyJFKT0/XlClTJElut1uffPJJh+v7X1OCzQz7tu5+HF3NCDAQVK+9
9poeeOABo9swhMPh0OLFi/XOO+8Y3UrIHTt2TOHh4fr1r39tdCtBl5ycrGPHjqmhocHQPsrKyvTt
b39bffoE/+rYZtlmhN6KFSvUt29f7du3T7feemvgBNxXqq2trdPbLiXA6Gqc7mJEgOG/v9va2tTQ
0KCEhISg1PG/ZlssFm3dujUoNYDOFBYWmvLD/He+8x3Nnj2728brze+Nly5dqrKyMpWVlWnNmjWS
Og8w2traAq8pwWaGfVt3P46uZgQYCJqGhga1trYqOjra6FYM069fP505c8boNkLuq6++0r/9278Z
3UbI2Gw2lZSUGNrDli1bNGHChJDV829zqL49MYM//vGP+u53v6uxY8ca9u3kI488YkhdP7vdri+/
/FI/+9nPtHnzZj366KNXPN5TTz2lO++8Uz6fTz/5yU80duxYjR8/XjU1NXrllVf08ccfy263a8eO
HYH6knTffffp6NGjKiws1PTp0zV58mQVFxfr9ttv1+OPP67vfve7eu655y66F6/Xq/vuu0/jx4/X
Y489ptzcXBUXF8tms8lisWjjxo3629/+puLiYn3/+9/XU089dUXbfrG++OILDRgwQJK0e/duDRs2
LCh12r9mT5gwQVu2bJF09kPlzp07g1JTOvsYeOKJJ5Senq7HH39cknTmzBnNmDFDGRkZuvvuu9XU
1KTXX39dGzZs0OnTp3XNNdfo73//u9577z0tWLCg2+sXFhbqpZdekiS99dZbIX++Nzc3q6WlJaQ1
/datW6c77rgjcF/0ZHa7XT/5yU+Unp6un/3sZ5I6/9t++eWXuv/++zV8+HBt2LDhnHHcbrfmzJkj
6ew3+SNHjpTdbtf//u//XnJP7Z9n69at0+OPP6477rjjCrby4rW0tKi5uTmoNU6fPi2Px6PW1lbF
x8fr9OnTKiws1DPPPKNZs2bJ7XZr8uTJmjJligoLCwP79MLCQt17772aPHmyRowYoc8//1yS9Itf
/EKjRo3ST3/6U91+++2X1VNn+7ZQa/84wpUhwEDQfPLJJxo8eLDRbRgqNjZWNTU1RrcRctXV1YqN
jQ1pzV/96ley2+3n/CxdujQktb/5zW/qo48+Ckmtzhw8eFDf/OY3Q1bPv82h+vbEDIqKirRgwQK9
++67mj9/viE9vPbaa4bUPd+yZcs0bdo0vfzyy1c81p133qlt27apqKhI8fHxevfdd7VkyRItXbpU
s2fP1s033yy3262MjIxOx4iMjNSbb76p7OxsNTQ06Mknn5TH47mkDxhbtmzRTTfdpNLSUt16663y
+XwqKCjQ9u3bVV5erpdeekkpKSnKysrS//zP/2jZsmVXvO0Xo6SkRHfeeackqbi4WHfddVdQ6rR/
zW6/T8vNzdWoUaOCUtPvnnvuUVlZmaqqqtTY2KhVq1YpIyNDO3bs0AMPPKDXX39dNptN5eXlev/9
95WRkaHy8nKVl5crPT292+sb5S9/+YueeOIJZWRkGNZHSkqK+vXrp8TEREPqd7fJkyerrKxMX3zx
hf70pz91ul5NTY1WrlypyspKLVu2TK2trRes09bWpmeeeUZbt26V2+2+rFkU7Z9nKSkpSkxMVL9+
/S55nMvR2NiojIwMPfHEE/rLX/4SlBrr16/Xiy++qBtvvFELFy5U3759lZubq1/+8pdat25doI/N
mzfroYceOud3Y2Nj9eabb+qhhx6Sy+XSsWPHVFJSIo/Ho8cff1z19fWX1VNn+7berqv3xUa+Z+4O
wZ9rDFzFbr31Vn388ceaPn36VXE4hXT2Q96DDz6oTZs2hbTuk08+qSeffDKkNXF1eeyxx7R48WKt
X79eDzzwgLKzs41uqVcYMWKEJOmjjz7S7373O5WVlcnn8yk5OfmCdcPCwgL/bn8cs38MSYqPj9eg
QYMk6ZJmAB4+fDgwu2HYsGHasmWLPvnkE2VmZko6+y3e8ePHL2HLuse2bdv04osvSjo7A2PevHkh
7yHYbrvtNknSf/zHf6ihoUEfffSRPvjgA61bt04tLS2y2WxKS0vTgQMHVFZWprlz52rDhg2qrq7W
E0880e31O3ucBUNLS4sKCwu1adMmDRo0SN///ve1fPnyoNY0m0WLFmnHjh06duyYoqOjFRcXp+99
73v60Y9+dMVj+5/TI0aM0KFDhzr9237jG98IHJqVnJysEydOXDDW8ePHlZycrJiYGElSeHjP+h44
MTFRFRUV8ng8euGFF/Tpp5/qvvvuU25uriIjI7ulxvDhw/Xb3/5Wzc3NysrK0sGDBztcp/3fwc//
PExOTlZVVZWOHj2qoUOHKiwsTDfddJP69+/fLT1eLbp6X9zT3zMTYCBobrrpJh09etToNgy1d+9e
paamhjS8qKmpUVJSUsjqnW/ixInavHmz1q1bJ7vdHrK6v/rVr1RUVHTOsqysLD399NNBr/3Xv/5V
Q4YMCXqdrtx8883661//GrJvzfzbXFtbq+uvv77DNyO9TWxsrF566SU1Nzdr2LBhhgQYRj+/g8H/
ISAtLU0Oh0P//d//LUmBKfTtH1vx8fGqqanRjTfeeM43iO0/SFzuY/HGG2/Unj17dO+992rPnj1K
TExUWlqatm7dqqioKLW0tCgyMlKRkZEdfjsbDG1tbWpqalJcXJzq6uoUHx8ftA9N7V+z2+/T6urq
FB0dHdRvic//UJmWlqZRo0Zp5syZks4+FsLCwpSQkKDKykrNnz9fy5cv11dffdUtfZ1fPz4+XgcO
HJCkwMlTg+XkyZN69dVXdS2onNYAAASPSURBVPvtt2v27NmBD3FXk/nz52v+/PkqLCzU4MGDu/W9
w549ezR+/Hjt3r1bdrtdtbW1Hf5tjx49qvr6evXr10/V1dUdvpZee+21qqmp0Zdffqn+/furra3t
kp+PZnhvbLFY1LdvX7388st69dVXde+993bbeXUOHTqk1NRURUVFKTY2Vj6f74J9Zmf32fnPw8GD
B2v//v3y+Xw6fPhw4PDBS9XZvq236+p9sZHvmbtDz4oOcVmOHTt2xceIXo64uDiFh4dfleeA8Gtq
alJKSkrI6nm9Xt1///0hq9eZlJSUkJ/g8cknn5Tb7T7nJ1Q74vLy8sAUb6Pcc8892rZtW8jq+bd5
xowZQT+m1ixee+01paeny263Kzc3N+T1zfL8DpbJkyfrn//8p8aOHauMjIzAdOObb75Z9957ryor
K/Xoo48qJydHDz74YOC8EN3lnnvu0cGDBzVu3Di9//77uuaaazRv3jxNmDBBY8eODUwXv+uuu5SX
l6clS5Z0a/2O7Nq1KzC7ZOvWrUE9z0371+xt27bpP//zPyVJy5cvD+o5MDryox/9SNu2bVNGRoYy
MjICJ90bPXp04PxK1113XdA+7I8fP14ej0fZ2dn69NNPg1LDLyEhQVVVVXr00Ue1evVqjR07VitW
rNBXX30V1LpXi3feeUfp6elKTEzUsGHDOv3bJicn66c//amsVqvmzJmjiIiIC8YKDw/XkiVLNG7c
OI0dO1br16+/5H6MfG/81VdfacWKFRo7dqxWr16tRx99VFVVVd16UuA5c+bIYrFo9OjRSktL05Ah
Q5SRkaHnn38+cB6Si3XddddpwoQJGjVqlFasWHHZfXa2b+vtunpfbOR75u4Q5uM6WQiioqIiHTt2
TD/4wQ9CUq/9JSRDcTWG82u2Fx4erjfffFNVVVVatGhRSHrZtWuX9u3bpx/+8IdBr9XVdh8/flwP
PPBAt12h4GLrGjGds7q6Ws8//7xWrFgRknpdbf/999+v9evXB/2x79/m559/Xo899pheffXVoNbD
WWZ5fl/O88xsz9vO+GdZvP7666qvr1d+fn7Iand0H1VVVSkxMVE33HCDSkpKNHz4cP37v/970Hrw
v2Zv3bpVGzduVHh4uGbPnq0XX3zxivcrRj8GjK5/sc6cOaPNmzcrKysraFebkYy5P4JVs7NxMzIy
9Pbbb5vu0INQvzf2q6urU3FxsaZOnXrFJ9gP1ePHv0/+5JNPlJeXp7fffvuyxulo39Zb9JR9W7fy
Ab2IpMBPKFRXV59Ts/3PkCFDfOnp6b49e/aEpJdQ6mq7FyxY4PP5fL5Zs2b5HA5HyOv2Zlf79vt8
Pl9DQ4NvzJgx5/zs3r3b6LZ6le5+nPWkx+1dd93ls9lsvvHjx/v++c9/hqxuT7qPLofR22d0fbMx
4v4IVs2uxh0zZozv5MmT3bcR8Pl8oX38PPPMM7709HTfiBEjfB988EG3jt0bXK37NmZgAAAAAAAA
0+ul80oAAAAAAEBvQoABAAAAAABMjwADAAAAAACYHgEGAAAAAAAwPQIMAAAAAABgev8PFLEHrpOd
occAAAAASUVORK5CYII=
" />
</div>
</div>
<div class="output_area">
<div class="output_png output_subarea ">
<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAABDAAAADQCAYAAADxn5GHAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz
AAAJOgAACToB8GSSSgAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0
dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAgAElEQVR4nO3de1zUdb7H8Tcgph0fgKytWYi2xabH
smOGAsMA3ogQ3dUEq9Ui85RmblYeKbc1lWNlF9pOma3rtdVtY3xUtvlIWkKSi23pEdS8HHPVRcxO
HAjIMpjhe/7owSyYN3AuP+z1fDx6JN+Z+X0+37n8ZuY9399MgDHGCAAAAAAAwMIC/d0AAAAAAADA
uRBgAAAAAAAAyyPAAAAAAAAAlkeAAQAAAAAALI8AAwAAAAAAWF4nfzfQFoMHD9bVV1/t7zYAAAAA
AICXHDx4UNu3b//BeIcKMK6++mrl5ub6uw0AAAAAAOAlGRkZpx3nEBIAAAAAAGB5BBgAAAAAAMDy
CDAAAAAAAIDlnTXAqK2t1ZAhQ9StWzft3r3bPX7kyBFdcskl7rF9+/YpISFBcXFx+uCDDyRJJ06c
0Pjx4xUfH69nnnnGfdmsrCzZ7XZNnjxZjY2NkiSHw6G4uDiNGDFCR48e9fgkAQAAAABAx3bWAOPS
Sy/Vxo0bNWHChFbjzzzzjGw2m/vvuXPnasWKFdq0aZPmzZsnSVq+fLlSU1NVXFysgoICVVZWqry8
XJWVlSoqKlK/fv20fv16OZ1O5eTkqLCwUAsXLlR2drYXpgkAAAAAADqyswYYwcHBuuyyy1qNHTp0
SAEBAYqMjHSPHTt2TFFRUQoJCVF4eLiqqqpUWlqq5ORkSdKoUaO0devWVmMpKSkqKSnRgQMH1L9/
f3Xu3Fk2m007d+709BwBAAAAAEAH1+bvwFi8eLFmz57daqypqcn979DQUFVXV6umpkYhISFtGpMk
l8vVatsOh0MZGRnKyMhQRUVFW9sFAAAAAAAXgU5tOfPBgwclSX379m01Hhj4zxyktrZW4eHhCgsL
U11dncLCwlRbW6s+ffrI6XSqrq7utOdrFhQU1Grb6enpSk9Pl3Tm34LtCPo+utFvtQ8/PdpvtQEA
AAAA8IQ2rcAoLy/Xp59+qpSUFP31r3/VtGnTdPLkSfXq1UsHDx5UfX29qqur1aNHD8XFxSk/P1+S
lJ+fr5iYmFZjeXl5stlsioqK0t69e9XQ0KDS0lINHDjQ87MEAAAAAAAd2jlXYKSmpqqsrEz79+/X
fffdp6KiIklSZmamZs+erS5dumjRokXKzMyUy+XSggULJElTp07VpEmTtHLlSqWlpSkiIkIRERHq
2bOn7Ha7IiMjNXv2bAUHB2vWrFlKSkpSly5dtGbNGu/OGAAAAAAAdDgBxhjj7ybOV0ZGhnJzc/3d
RrtwCAkAAAAAAOd2pvf+bf4STwAAAAAAAF8jwAAAAAAAAJZHgAEAAAAAACyPAAMAAAAAAFgeAQYA
AAAAALA8AgwAAAAAAGB5BBgAAAAAAMDyCDAAAAAAAIDlEWAAAAAAAADLI8AAAAAAAACWR4ABAAAA
AAAsjwADAAAAAABYHgEGAAAAAACwPAIMAAAAAABgeQQYAAAAAADA8ggwAAAAAACA5Z01wKitrdWQ
IUPUrVs37d69W/X19Ro+fLgSEhI0fPhwHTlyRJK0b98+JSQkKC4uTh988IEk6cSJExo/frzi4+P1
zDPPuLeZlZUlu92uyZMnq7GxUZLkcDgUFxenESNG6OjRo96aKwAAAAAA6KDOGmBceuml2rhxoyZM
mCBJCg4O1tq1a7VlyxZlZWXp2WeflSTNnTtXK1as0KZNmzRv3jxJ0vLly5Wamqri4mIVFBSosrJS
5eXlqqysVFFRkfr166f169fL6XQqJydHhYWFWrhwobKzs708ZQAAAAAA0NGcNcAIDg7WZZdd5v67
S5cuuuKKKyRJnTt3VmDg9xc/duyYoqKiFBISovDwcFVVVam0tFTJycmSpFGjRmnr1q2txlJSUlRS
UqIDBw6of//+6ty5s2w2m3bu3OmViQIAAAAAgI6rXd+B0dDQoPnz52vmzJmSpKamJvdpoaGhqq6u
Vk1NjUJCQto0Jkkul6vdkwEAAAAAABenTu250L333qv7779fUVFRkuReiSF9/70Z4eHhCgsLU11d
ncLCwlRbW6s+ffrI6XSqrq7utOdrFhQU1KqWw+GQw+GQJFVUVLSnXQAAAAAA0MG1eQXGggUL9LOf
/UwTJ050j/Xq1UsHDx5UfX29qqur1aNHD8XFxSk/P1+SlJ+fr5iYmFZjeXl5stlsioqK0t69e9XQ
0KDS0lINHDiwVb309HTl5uYqNzdXvXv3vpC5AgAAAACADuqcKzBSU1NVVlam/fv3KzU1VdnZ2YqP
j1dBQYFiY2P11FNPadGiRcrMzJTL5dKCBQskSVOnTtWkSZO0cuVKpaWlKSIiQhEREerZs6fsdrsi
IyM1e/ZsBQcHa9asWUpKSlKXLl20Zs0ar08aAAAAAAB0LAHGGOPvJs5XRkaGcnNz/d1Gu/R9dKPf
ah9+erTfagMAAAAA0BZneu/fri/xBAAAAAAA8CUCDAAAAAAAYHkEGAAAAAAAwPIIMAAAAAAAgOWd
81dI0PHxBaIAAAAAgI6OAANeRXgCAAAAAPAEDiEBAAAAAACWR4ABAAAAAAAsjwADAAAAAABYHgEG
AAAAAACwPAIMAAAAAABgeQQYAAAAAADA8ggwAAAAAACA5RFgAAAAAAAAyyPAAAAAAAAAlkeAAQAA
AAAALI8AAwAAAAAAWN5ZA4za2loNGTJE3bp10+7duyVJDodDcXFxGjFihI4ePSpJ2rdvnxISEhQX
F6cPPvhAknTixAmNHz9e8fHxeuaZZ9zbzMrKkt1u1+TJk9XY2HjGbQIAAAAAADQ7a4Bx6aWXauPG
jZowYYIkyel0KicnR4WFhVq4cKGys7MlSXPnztWKFSu0adMmzZs3T5K0fPlypaamqri4WAUFBaqs
rFR5ebkqKytVVFSkfv36af369WfcJgAAAAAAQLOzBhjBwcG67LLL3H8fOHBA/fv3V+fOnWWz2bRz
505J0rFjxxQVFaWQkBCFh4erqqpKpaWlSk5OliSNGjVKW7dubTWWkpKikpKSM24TAAAAAACgWZu+
A6OmpkYhISHuv10ulySpqanJPRYaGqrq6upW5z3fsZbbBAAAAAAAaNapLWcOCwtTXV2d+++goCBJ
UmDgP3OQ2tpahYeHu88bFham2tpa9enTR06n0335U8936jabORwOORwOSVJFRUUbpwcAAAAAAC4G
bVqBERUVpb1796qhoUGlpaUaOHCgJKlXr146ePCg6uvrVV1drR49eiguLk75+fmSpPz8fMXExLQa
y8vLk81mO+M2m6Wnpys3N1e5ubnq3bu3J+YMAAAAAAA6mHOuwEhNTVVZWZn279+v++67T7NmzVJS
UpK6dOmiNWvWSJIWLVqkzMxMuVwuLViwQJI0depUTZo0SStXrlRaWpoiIiIUERGhnj17ym63KzIy
UrNnz1ZwcPBptwkAAAAAANAswBhj/N3E+crIyFBubq6/22iXvo9u9HcLPzqHnx7t7xYAAAAAAG10
pvf+bTqEBAAAAAAAwB8IMAAAAAAAgOURYAAAAAAAAMsjwAAAAAAAAJZHgAEAAAAAACyPAAMAAAAA
AFgeAQYAAAAAALA8AgwAAAAAAGB5BBgAAAAAAMDyCDAAAAAAAIDlEWAAAAAAAADL6+TvBgBv6fvo
Rr/VPvz0aL/VBgAAAICLESswAAAAAACA5RFgAAAAAAAAyyPAAAAAAAAAlkeAAQAAAAAALI8AAwAA
AAAAWF6bA4ympiZlZmbKbrcrPj5e+/btU3FxseLi4hQfH69du3ZJko4fP67k5GTZbDatXbtWkuRy
uTRlyhTZ7XbNmjXLvc0XX3xRNptNY8eOVV1dnYemBgAAAAAALhZtDjDKysr03XffqaioSE899ZRy
cnL0m9/8Rhs3btSf/vQnZWVlSZIWL16sOXPm6MMPP9SSJUt08uRJvfvuu7riiitUVFSkEydOaOvW
raqqqtI777yj4uJiTZw4UUuWLPH4JAEAAAAAQMfW5gAjIiJCxhgZY1RTU6N/+Zd/UVBQkLp3767I
yEhVV1dLkj7++GMNHz5cnTp10k033aTdu3ertLRUycnJkqSUlBSVlJTok08+UWJiogICAtxjAAAA
AAAALXVq6wV69Oih4OBg9evXTydPnlRRUZF+/etf/3ODnTqpoaFBjY2NCgz8Ph8JDQ1VdXW1ampq
FBIScs6xlhwOhxwOhySpoqKifbMEAAAAAAAdWpsDjPfff1+dOnXS/v37tW3bNj3yyCOtvrfC6XSq
c+fOCg4OVlNTkwIDA1VbW6vw8HCFhYW5z9ty7LPPPms11lJ6errS09MlSRkZGe2eKAAAAAAA6Lja
fAiJMUY/+clPJH2/GqO+vl5Op1NfffWVKioq3AFEdHS0CgsL5XQ6tX37dg0YMEBxcXHKz8+XJOXl
5clmsyk6OlpbtmxpNQYAAAAAANBSm1dgjBo1SqtXr1ZiYqK+++475eTkyOl0KjU1VQEBAXrllVck
SVlZWbrzzjv1+OOPa9q0aeratavS0tL09ttvy263a9CgQYqNjZUkjR49WjabTd27d9e6des8O0MA
AAAAANDhBRhjjL+bOF8ZGRnKzc31dxvt0vfRjf5uAT50+OnR/m4BAAAAADqkM733b/MhJAAAAAAA
AL5GgAEAAAAAACyPAAMAAAAAAFgeAQYAAAAAALA8AgwAAAAAAGB5BBgAAAAAAMDyCDAAAAAAAIDl
EWAAAAAAAADLI8AAAAAAAACW18nfDQAXo76PbvRb7cNPj/ZbbQAAAADwFlZgAAAAAAAAyyPAAAAA
AAAAlkeAAQAAAAAALI8AAwAAAAAAWB4BBgAAAAAAsDwCDAAAAAAAYHkEGAAAAAAAwPLaFWAUFhZq
xIgRGjZsmN566y0VFxcrLi5O8fHx2rVrlyTp+PHjSk5Ols1m09q1ayVJLpdLU6ZMkd1u16xZs9zb
e/HFF2Wz2TR27FjV1dV5YFoAAAAAAOBi0uYA49tvv9Xzzz+v9957T5s3b9a4ceP0m9/8Rhs3btSf
/vQnZWVlSZIWL16sOXPm6MMPP9SSJUt08uRJvfvuu7riiitUVFSkEydOaOvWraqqqtI777yj4uJi
TZw4UUuWLPH4JAEAAAAAQMfW5gBj69at6tq1q8aMGaNx48bp888/V1BQkLp3767IyEhVV1dLkj7+
+GMNHz5cnTp10k033aTdu3ertLRUycnJkqSUlBSVlJTok08+UWJiogICAtxjAAAAAAAALXVq6wW+
+OILffbZZ/roo4+Un5+vJ554QiEhIf/cYKdOamhoUGNjowIDv89HQkNDVV1drZqaGvd5zzbWksPh
kMPhkCRVVFS0b5YAAAAAAKBDa/MKjLCwMNlsNnXu3FkjRozQjh07Wn1vhdPpVOfOnRUcHKympiZJ
Um1trcLDwxUWFuY+79nGWkpPT1dubq5yc3PVu3fvdk8UAAAAAAB0XG0OMKKjo7V3714ZY1RWVqZ/
/dd/ldPp1FdffaWKigp3ABEdHa3CwkI5nU5t375dAwYMUFxcnPLz8yVJeXl5stlsio6O1pYtW1qN
AQAAAAAAtNTmQ0h69OihcePGub+3YuXKlaqsrFRqaqoCAgL0yiuvSJKysrJ055136vHHH9e0adPU
tWtXpaWl6e2335bdbtegQYMUGxsrSRo9erRsNpu6d++udevWeXaGAAAAAACgwwswxhh/N3G+MjIy
lJub6+822qXvoxv93QJ+JA4/PdrfLQAAAABAu53pvX+bDyEBAAAAAADwNQIMAAAAAABgeQQYAAAA
AADA8ggwAAAAAACA5RFgAAAAAAAAyyPAAAAAAAAAlkeAAQAAAAAALI8AAwAAAAAAWB4BBgAAAAAA
sDwCDAAAAAAAYHkEGAAAAAAAwPI6+bsBAJ7V99GNfqt9+OnRfqsNAAAA4OLGCgwAAAAAAGB5rMAA
4DGs/gAAAADgLazAAAAAAAAAlkeAAQAAAAAALK/dAcbrr7+uyy67TJLkcDgUFxenESNG6OjRo5Kk
ffv2KSEhQXFxcfrggw8kSSdOnND48eMVHx+vZ555xr2trKws2e12TZ48WY2NjRcyHwAAAAAAcBFq
V4DhcrnkcDjUu3dvOZ1O5eTkqLCwUAsXLlR2drYkae7cuVqxYoU2bdqkefPmSZKWL1+u1NRUFRcX
q6CgQJWVlSovL1dlZaWKiorUr18/rV+/3nOzAwAAAAAAF4V2BRivv/660tPTFRgYqAMHDqh///7q
3LmzbDabdu7cKUk6duyYoqKiFBISovDwcFVVVam0tFTJycmSpFGjRmnr1q2txlJSUlRSUuKhqQEA
AAAAgItFmwMMl8ul3NxcTZw4UZJUU1OjkJCQVqdLUlNTk3ssNDRU1dXVrc57tjEAAAAAAICW2vwz
qmvXrlVGRoYCA7/PPsLCwlRXV+c+PSgoSJLcp0tSbW2twsPD3ecNCwtTbW2t+vTpI6fT6b588/la
cjgccjgckqSKioq2tgsAAAAAAC4CbV6BsWfPHr322mtKSUnRgQMH9NJLL2nv3r1qaGhQaWmpBg4c
KEnq1auXDh48qPr6elVXV6tHjx6Ki4tTfn6+JCk/P18xMTGtxvLy8mSz2VrVS09PV25urnJzc9W7
d+8LnS8AAAAAAOiA2rwCY/Hixe5/33TTTVq6dKneeOMNJSUlqUuXLlqzZo0kadGiRcrMzJTL5dKC
BQskSVOnTtWkSZO0cuVKpaWlKSIiQhEREerZs6fsdrsiIyM1e/ZsD00NAAAAAABcLAKMMcbfTZyv
jIwM5ebm+ruNdun76EZ/twDASw4/PdrfLQAAAAAXjTO992/Xr5AAAAAAAAD4UpsPIQEAtObPFVas
/gAAAMCPBQEGAHRghCcAAAD4seAQEgAAAAAAYHkEGAAAAAAAwPIIMAAAAAAAgOURYAAAAAAAAMsj
wAAAAAAAAJbHr5AAANqFX0ABAACAL7ECAwAAAAAAWB4BBgAAAAAAsDwOIQEAdDgcvgIAAPDjwwoM
AAAAAABgeQQYAAAAAADA8jiEBACANuDwFQAAAP9gBQYAAAAAALA8AgwAAAAAAGB5bT6E5OOPP9aD
Dz6o4OBgXXnllXrttdf09ttv64UXXlDXrl21Zs0aRUREaN++fbr33nvldDqVnZ2tESNG6MSJE5o8
ebL+93//V2PHjtWcOXMkSVlZWSotLVXfvn21cuVKBQcHe3yiAAB0dBy+AgAAfszavAKjd+/eKigo
0JYtW9S3b19t2LBBOTk5Kiws1MKFC5WdnS1Jmjt3rlasWKFNmzZp3rx5kqTly5crNTVVxcXFKigo
UGVlpcrLy1VZWamioiL169dP69ev9+wMAQAAAABAh9fmAKNXr17q2rWrJKlz587av3+/+vfvr86d
O8tms2nnzp2SpGPHjikqKkohISEKDw9XVVWVSktLlZycLEkaNWqUtm7d2mosJSVFJSUlnpobAAAA
AAC4SLT7V0iOHDmi999/X08//bS+/PJL97jL5ZIkNTU1ucdCQ0NVXV2tmpoahYSE/GCsV69ercZa
cjgccjgckqSKior2tgsAAC4Ah68AAAB/a1eAUVdXp8mTJ2v16tVyuVyqq6tznxYUFCRJCgz85+KO
2tpahYeHKywsTHV1dQoLC1Ntba369Okjp9Ppvnzz+VpKT09Xenq6JCkjI6M97QIAAAAAgA6uzQGG
0+nUbbfdpieeeELXXnutGhsbtXfvXjU0NGjbtm0aOHCgpO8PNTl48KB++tOfqrq6Wj169FBcXJzy
8/M1ZcoU5efn6w9/+IOqqqqUk5OjO++8U3l5ebLZbB6fJAAA6LhY/QEAAKR2BBivv/66/va3vyk7
O1vZ2dmaPn26Zs2apaSkJHXp0kVr1qyRJC1atEiZmZlyuVxasGCBJGnq1KmaNGmSVq5cqbS0NEVE
RCgiIkI9e/aU3W5XZGSkZs+e7dkZAgAAtBPhCQAA1hFgjDH+buJ8ZWRkKDc3199ttIs/XwABAAC0
BeEJAMCfzvTev91f4gkAAICLEytPAABW1OafUQUAAAAAAPA1VmAAAADAMlj9AQA4E1ZgAAAAAAAA
y2MFBgAAAKAf75eus/IEQEdBgAEAAAD8iP1Ygxt/IjQC2ocAAwAAAAB86McaGhHc4EIRYAAAAAAA
vO7HGtz408UWGvElngAAAAAAwPIIMAAAAAAAgOURYAAAAAAAAMsjwAAAAAAAAJZHgAEAAAAAACyP
AAMAAAAAAFgeAQYAAAAAALA8AgwAAAAAAGB5BBgAAAAAAMDyLBNgZGVlyW63a/LkyWpsbPR3OwAA
AAAAwEIsEWCUl5ersrJSRUVF6tevn9avX+/vlgAAAAAAgIVYIsAoLS1VcnKyJCklJUUlJSV+7ggA
AAAAAFhJJ383IEk1NTXq1auXJCk0NFTV1dXu0xwOhxwOhyRp27ZtysjI8EuPF2qIH2tXVFSod+/e
1KY2talNbWpTm9rUpja1qU3tH1Ht2Nj/9FvtC3Hw4MHTn2AsYMmSJWbNmjXGGGO2bdtmZsyY4eeO
Li7p6enUpja1qU1talOb2tSmNrWpTW1qd2iWOIQkLi5O+fn5kqS8vDzZbDY/dwQAAAAAAKwkaP78
+fP93cTll1+u0tJSZWdnq6GhQY899piCgoL83dZFZcCAAdSmNrWpTW1qU5va1KY2talNbWp3WAHG
GOPvJgAAAAAAAM7GEoeQAAAAAAAAnA0BBjxu9erVamhokCTNnz9f7777rp87Ai5OLR9rP3ZlZWUa
MmSIHnnkEX+34lWzZs3St99+6+82LGn37t3KzMz02vZffvllrV692mvbPx1/3N55eXmKjo7Ws88+
6/VaZWVlWrp0qdfrtNdNN93k1e2f6/XS8ePH9cQTT3i1h6SkJH399dderXEmhw8f1vvvv++X2lbh
7f2Wv7X3PcGyZcs81sOmTZv01ltveWx77VVWVqaPP/5Y0vf3/QkTJrR5G7Nnz1ZhYWG76jc/n3zz
zTdKSkrSyJEj27UdKyDAgMfxpgpWceLECZ/UuO2227xe53R4rP3Te++9p8cee0zPP/+8v1vxqt/9
7nfq2rWrv9uAj/jj9n7zzTe1bNky/cd//MdZz2eM0YUehfxv//Zvmj59+gVtoyM71z788ssv14IF
C3zYkW8RYFz82vs6xZMBRkpKisaNG+ex7bVXywDDH5qfT8rLy3XDDTe4f0CjIyLAgEdt3bpVZWVl
uuWWW5STkyNJeuONN5SamqrExET3J0lPPvmkEhMTlZCQoF27dvmzZY/66KOPNHToUA0bNky+/H7c
xsZGy7yRffbZZ5WUlKQbb7xRf/3rX31e/9tvv9Uf//hHpaSk6NVXX/V6vYKCAg0fPtzrdU516mPt
6NGjGjlypBISEvTAAw/4vB9f2rx5s2JiYhQTE6PXXntNe/bs0e9//3vNmzfPoy96TlVYWKjk5GSN
GTNG0dHRftl3NX9aumHDBg0ZMkTDhg3z+SfYDQ0Namxs9GqN0+1HMjMzNW3aNI0aNUq//OUvZYyR
0+lURkaGRo4cqRdeeKFNNYwxmjFjhux2u4YNG6aioiLFx8fLZrPpqaeekiRVVFTIbrfrlltuafVi
z1fPYc239+rVq3Xrrbe673uff/65V+oVFBRow4YNuvfee/XOO+/84LEmfX87zJgxQ8nJyaqqqrqg
eoWFhZo9e7ZuvPFGPfDAAxo6dKgWL14sSfrHP/4hm82m1NRU3XbbbV5Z/WKM0cyZMzVs2DCNHDlS
R48e1dNPP63Y2Fjde++9ampq8njNZufzeqnlp7R333237Ha7kpKSdPjwYY/28thjjykhIUEPPvig
JOnkyZOaNGmShg8frrFjx6qurs6j9ZotXbpUb7zxhpKSklRdXe2VGlZ0IfutC3W++1ZPOJ/7eFNT
k0aOHKnExESNGjVKdXV1Wrp0qfbv36+kpCQVFBS0qeb777+vQYMGKT09XQkJCTp8+LBWr16tl19+
WevXr3fvX77++mv3a7fVq1fLbrcrLi7OXS8pKUkPP/zwGV9TNZ8eExOj+fPna+bMmbrpppv0u9/9
TpL097//XTfffLOSkpL00EMPSfr+/v7iiy8qOTlZkvT5559r4sSJuv766911T7fPLS8vV3R0tNLS
0rRz5842XR+n9vz111/rwQcf1Jtvvqn777+/3dvyO//9gisuVomJiaa+vt4YY8wTTzxhFixYYIwx
Zs6cOWbDhg1m165d5s477zTGGFNZWWnGjh3rt1497fHHHzcbN240xhjjcrl8VvfLL780NpvNPPTQ
Q2b37t0+q3s6J06cMMYY88UXX5iEhASf1d21a5eZPn26GTlypFmyZImprq72Sd3777/fHDlyxCe1
TtXysTZjxgzz3nvvGWOMmTJlivnwww/90pMvDB061Hz55ZemoaHBDB482HzzzTfmiSeeMH/5y1+8
Wnfz5s3GZrOZpqYms2fPHjNmzBiv1jud5tt80qRJ5tNPPzXG+G5fs3v3bvPQQw8Zm81mvvzyS6/W
Ot1+5K677jJr1qwxxhiTkZFhysvLjcPhMI899pgxxpilS5eau+6667xrbNiwwTzwwAPuv9PS0sye
PXtMU1OTGTVqlDl06JCZMWOGycvLM8YYM3HiRLNq1SqfPoc1396rVq0yd999tzHGmFdeecW8+OKL
Xqt51113mV27dhljTv9Yu+uuu8zy5cs9Umvz5s3mkUceMVdddZU5fPiwcTqdZsCAAcYY0+q6v/32
282qVas8UrOlv/zlL+a3v/2tMcaYjz76yEybNs0kJCS4H+N9+/b1eM2WzvV66dChQ+bWW281DQ0N
JjY21jQ1NRljPPuYT0xMbHUf3759u3nppZfMihUrjDHG/PnPfzbPPvusx+q11Hz7/9hcyH7rQp3v
vtVTznUfb9lTTk6OWbZsmTHGmMGDB7er3tChQ83//d//mZMnT5q+ffuaQ4cOmVWrVpmXXnrJfPPN
N+45r1u3zrzwwgumqqrK3J254DUAAAj1SURBVHzzzaapqcl8/fXXJjEx0d138+uomJgY89VXX/1g
XsXFxcblcpkrr7zS7NixwzQ2Npobb7zRGGNMenq6+eyzz4wxxkybNs188skn7j6MMebQoUPm5z//
uWlsbDR79uwx48aNc/d/6j43LS3N7Nu3z7hcLhMbG2s2b97cruum+ba4GB53nfwdoODiN2jQIElS
7969VVNToz179qi0tFRJSUmSdFH9ZO6MGTP0n//5n1q3bp1+9atfKTU11Sd1e/TooeLiYpWWlurF
F1/UkSNHNGHCBGVmZio4ONgnPTT74x//qHXr1ikwMNBrnxKezubNm1VSUqJZs2YpPT1d3bp180nd
f/zjH4qMjPRJrbP57LPPFB0dLUmKjo7WgQMHlJCQ4OeuvMPlcqlHjx6SpGuuuUbHjh3zWe1BgwYp
ICBA/fv39+n9+1S//e1v9dxzz+nbb7/VjBkzFBMT45U6jY2NWr16tdavX68+ffro7rvvdn+S5k1n
2o+c+nzy2WefafDgwZK+v99/9NFH511j7969SkxMdP99/Phx9e/fX5J044036uDBgz/YviS/PYe1
nPv27dt9UvNMj7Xm68JTunfvrj59+kiSunTpIkmtrvvm/3vanj179NZbb2nLli0yxig4OFgDBw50
P8Z99TzS7NT7d7Pg4GDNmDFDkydP1k9+8hMtWrTIo721vI8fOHBAe/bs0SeffKLXXntNjY2Nstvt
HqtlJTk5OXrnnXc0evTocx4y5UkXst+6UOe7b/WWU+t8/fXXuu+++3T06FFVV1e363shWnK5XAoP
D5ckXXfdda1O69q1qyIjI/U///M/Wr9+vV5++WUdPHhQn376qYYNGyZJ+vLLL3/Q65VXXqmvvvpK
oaGhrbY3cOBABQYG6vLLL9cNN9yggIAA92vuffv26Z577pEk1dfX6+abb/5Br9ddd506derU6jo/
3T73+PHjuvbaayV5b1/Y0XAICTwuODhYLpfL/XdAQID738YY9evXT4mJiSosLFRhYaE2bdrkjza9
IjQ0VC+//LJWrVqlrKwsn9ePi4vT9OnTFRkZqVdffVX19fU+7+Gll17S5s2b9cYbb3hsGeL5mDlz
pkpKStTU1KRx48YpMzPT6y/y9+/fr5///OderXE2LR9r11xzjfvYyk8++URRUVFer3/06FGv1zid
wMBAVVVVqbGxUQcOHNAVV1zhs9plZWUyxmj//v3q1auXz+qeqnfv3lq2bJkWL16suXPneq1OfX29
Xn31VUVGRmr69OmKjY31Wq2WzrQfOfX55JprrtGOHTskSdu2bWtTjf79+2vLli3uvy+77DLt3btX
xhj993//t66++urTbt9fz2Gnzt0XzvRYCwz07MvHlnNr1vK6b/6/p/Xr108ZGRkqLCzUhx9+qFWr
VmnXrl3ux7i3v9zyXK+XmrlcLmVkZGjt2rXq2bOn3nzzTY/20fI+fs0116hfv3769a9/rcLCQpWU
lCg7O9uj9ZqdOn9fe/jhh1VYWOjT8ELSBe23LtT57ls95Vz38by8PF111VX68MMPlZmZ6a59un3C
+QgKClJNTY0aGhr06aef/uD0iRMnatmyZfrmm290xRVX6Gc/+5kGDhyozZs3q7CwUGVlZWfs9VQt
Tz+132uvvVZr1qxRYWGhtm3bprS0tPN6vJ9un9uzZ08dOHDA/dwEiRUY8LixY8cqIyNDt95662lP
HzhwoKKiopSYmKjAwECNGjXK4y/Ajx8/rqVLl/r8y69+//vf680335TT6fTpt0p/9913Wrp0qTZs
2KABAwbo/vvvdyfHvhYfH6/4+HjFxMT4/NOrbt266Z577tE999yjvXv36u9//7tX623atEm33HKL
V2ucTcvHWlZWlu666y49+eSTuu6667y++sLpdOr2229XUVGRV+uczpNPPqnRo0crICBADzzwgE+/
5DA0NFRjxozRF198oRUrVvis7qkWLFigrVu3qqGhQTNnzvRanfDwcG3fvl07duzQihUr9Omnn+oX
v/iFpk+frksuucRrdc93P/LLX/5Sf/7znzVixIg2h4ljxozRpk2bFB8fr+DgYM2fP19Tp06VMUaj
R49W3759NWfOHN1xxx167rnnFBISIsk3z2FW4c/H2pw5c3T77bfr+eefV9euXb2ymnDMmDEqKCjQ
sGHDFBAQoF/96ldKTk5WbGysBg8erO7du3u8Zkvner3UrL6+Xr/4xS8UEBCggIAArVu3zqN9vPfe
e1q4cKFuuOEGDR48WAMGDNC9996rVatWSZIeeeQRjR492qM1Jen666/XY489pvT0dP3hD39QWFiY
x2tY0YXsty6Ur1+jnes+HhMToyeffFI7duxQz5493Star732Wt166616+OGHZbPZzrvewoULNWLE
CF111VW6/PLLf7DfuPnmmzVlyhQtXLhQ0vcrmG+77TYlJiYqKChI119/vf7rv/6rnbP9p8WLF2va
tGk6efKkgoKCtHLlSsXGxurOO+/U3/72Nz355JOnvdzp9rnZ2dm644479NOf/tTr+6SOIsD48iNS
+IzT6TzteGBgoMc+OfFFDZyf6upqbdq0SePHj3cvv/U2f93+VrrfTZgwQevWrfPqGznJWnNu9vHH
H6u8vFz//u//7rUaVpt3YWGh3n33XT333HNer2W1uTc7efKk3nzzTaWkpLiX6V4Iq87T1/xxPVj9
unc6nerU6fvP2e644w49+OCDGjp0qJ+7ah8rXNdW6AG+4+vb2yr1XC6XLrnkEn333XeKjo7Wjh07
Lugwv474uOmIPbfVxTELtHL06FEFBwef9r/mxLEj1MD5Cw8P1x133OGz8MJft7/V7nfjx4/3enhh
tTk3GzJkiFfDC6vO2xesPPcuXbrojjvu8Eh4YeV5+pI/roeOcN0fOXJEdrtdsbGxCgkJ6bDhhRWu
ayv0AN/x9e1tpXo33nijkpKSFBsbq1mzZl1QeNERHzcdsef2YAUGAAAAAACwPFZgAAAAAAAAyyPA
AAAAAAAAlkeAAQAAAAAALI8AAwAAAAAAWB4BBgAAAAAAsLz/Byf5uN9NkeDaAAAAAElFTkSuQmCC
" />
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>Plotting your frequencies as a bar chart you start to see a nice picture of your data. Not that suprising, but the most common token happens to be the period and some other common syntactical tokens like curly braces and also key words like <em>if</em> and <em>return</em>.</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<details class="description" open="">
<summary class="btn btn-sm" data-open="Hide Code" data-close="Show Code"></summary>
<p><div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="k">def</span> <span class="nf">plot_hist</span><span class="p">(</span><span class="n">lens</span><span class="p">,</span> <span class="n">n_bins</span> <span class="o">=</span> <span class="mi">50</span><span class="p">):</span>
<span class="n">n</span><span class="p">,</span> <span class="n">bins</span><span class="p">,</span> <span class="n">patches</span> <span class="o">=</span> <span class="n">plt</span><span class="o">.</span><span class="n">hist</span><span class="p">(</span><span class="n">lens</span><span class="p">,</span> <span class="n">n_bins</span><span class="p">,</span> <span class="n">facecolor</span><span class="o">=</span><span class="s1">'blue'</span><span class="p">,</span> <span class="n">alpha</span><span class="o">=</span><span class="mf">0.9</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
<span class="nb">print</span><span class="p">(</span><span class="n">mean</span><span class="p">(</span><span class="n">code_lens</span><span class="p">),</span> <span class="n">median</span><span class="p">(</span><span class="n">code_lens</span><span class="p">),</span> <span class="n">stdev</span><span class="p">(</span><span class="n">code_lens</span><span class="p">))</span>
<span class="n">plot_hist</span><span class="p">(</span><span class="n">code_lens</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="n">mean</span><span class="p">(</span><span class="n">locs</span><span class="p">),</span> <span class="n">median</span><span class="p">(</span><span class="n">locs</span><span class="p">),</span> <span class="n">stdev</span><span class="p">(</span><span class="n">locs</span><span class="p">))</span>
<span class="n">plot_hist</span><span class="p">(</span><span class="n">locs</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="n">mean</span><span class="p">(</span><span class="n">comment_lens</span><span class="p">),</span> <span class="n">median</span><span class="p">(</span><span class="n">comment_lens</span><span class="p">),</span> <span class="n">stdev</span><span class="p">(</span><span class="n">comment_lens</span><span class="p">))</span>
<span class="n">plot_hist</span><span class="p">(</span><span class="n">comment_lens</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
</p>
</details>
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_subarea output_stream output_stdout output_text">
<pre>3.4662519750610943 3.0 2.6490695431339177
</pre>
</div>
</div>
<div class="output_area">
<div class="output_png output_subarea ">
<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAD4CAYAAADCb7BPAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz
AAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0
dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAUDUlEQVR4nO3dbYyd5X3n8e+vOCQsbbAJXgvbaM0q
ViIqNQRG4ChR1RIVDFvFvIgioqpYkRW/CFklUqUWdqVFTfqCvCkNUoqEAsVU2RCWNouFkrheB2ml
lXgYB8KTw3qaBGEb8ATzsC1SsqT/fXGuWQ7TsX3msjlzxv5+pKNz3//7us/1n9Gxfr4fzplUFZIk
LdZvLHUDkqTlyQCRJHUxQCRJXQwQSVIXA0SS1GXFUjcwLuedd15t2LBhqduQpGVl7969v6iq1Qtt
O20CZMOGDUxPTy91G5K0rCR5/mjbPIUlSepigEiSuhggkqQuBogkqYsBIknqYoBIkroYIJKkLgaI
JKmLASJJ6nLafBL9RKxdu3D90KHx9iFJk2SkI5AkK5Pcn+QnSfYl+ViSc5PsTrK/Pa9qY5PktiQz
SZ5McsnQ62xt4/cn2TpUvzTJU22f25Kk1Rc9hyRpPEY9hfV14AdV9WHgI8A+4EZgT1VtBPa0dYCr
gY3tsR24HQZhANwMXA5cBtw8FwhtzOeH9tvc6ouaQ5I0PscNkCTnAL8L3AlQVb+qqteALcCONmwH
cG1b3gLcUwMPAyuTnA9cBeyuqiNV9SqwG9jctr2/qh6uwR9ov2feay1mDknSmIxyBHIhMAv8TZLH
k3wzydnAmqp6sY15CVjTltcBLwztf6DVjlU/sECdjjneIcn2JNNJpmdnZ0f4USVJoxolQFYAlwC3
V9VHgX/m7VNJALQjhzr57Z3YHFV1R1VNVdXU6tULfp29JKnTKAFyADhQVY+09fsZBMrLc6eN2vPh
tv0gcMHQ/utb7Vj19QvU6ZhDkjQmxw2QqnoJeCHJh1rpk8CzwE5g7k6qrcADbXkncH27U2oT8Ho7
DbULuDLJqnbx/EpgV9v2RpJN7e6r6+e91mLmkCSNyaifA/mPwLeSnAn8FPgcg/C5L8k24HngM23s
94BrgBngzTaWqjqS5KvAY23cV6rqSFv+AnA3cBbw/fYAuGUxc0iSxieDSwunvqmpqer9k7Z+kFDS
6SrJ3qqaWmibX2UiSepigEiSuhggkqQuBogkqYsBIknqYoBIkroYIJKkLgaIJKmLASJJ6mKASJK6
GCCSpC4GiCSpiwEiSepigEiSuhggkqQuBogkqYsBIknqYoBIkroYIJKkLgaIJKmLASJJ6mKASJK6
GCCSpC4GiCSpiwEiSeoyUoAk+XmSp5I8kWS61c5NsjvJ/va8qtWT5LYkM0meTHLJ0OtsbeP3J9k6
VL+0vf5M2ze9c0iSxmMxRyC/X1UXV9VUW78R2FNVG4E9bR3gamBje2wHbodBGAA3A5cDlwE3zwVC
G/P5of0298whSRqfEzmFtQXY0ZZ3ANcO1e+pgYeBlUnOB64CdlfVkap6FdgNbG7b3l9VD1dVAffM
e63FzCFJGpNRA6SAf0iyN8n2VltTVS+25ZeANW15HfDC0L4HWu1Y9QML1HvmeIck25NMJ5menZ0d
6QeVJI1mxYjjPlFVB5P8W2B3kp8Mb6yqSlInv70Tm6Oq7gDuAJiamnpX+5Ok081IRyBVdbA9Hwa+
y+Aaxstzp43a8+E2/CBwwdDu61vtWPX1C9TpmEOSNCbHDZAkZyf5rbll4ErgaWAnMHcn1Vbggba8
E7i+3Sm1CXi9nYbaBVyZZFW7eH4lsKtteyPJpnb31fXzXmsxc0iSxmSUU1hrgO+2O2tXAP+1qn6Q
5DHgviTbgOeBz7Tx3wOuAWaAN4HPAVTVkSRfBR5r475SVUfa8heAu4GzgO+3B8Ati5lDkjQ+Gdz4
dOqbmpqq6enprn3Xrl24fujQCTQkSctAkr1DH994Bz+JLknqYoBIkroYIJKkLgaIJKmLASJJ6mKA
SJK6GCCSpC4GiCSpiwEiSepigEiSuhggkqQuBogkqYsBIknqYoBIkroYIJKkLgaIJKmLASJJ6mKA
SJK6GCCSpC4GiCSpiwEiSepigEiSuhggkqQuBogkqYsBIknqMnKAJDkjyeNJHmzrFyZ5JMlMku8k
ObPV39vWZ9r2DUOvcVOrP5fkqqH65labSXLjUH3Rc0iSxmMxRyBfAvYNrX8NuLWqPgi8Cmxr9W3A
q61+axtHkouA64DfBjYDf91C6QzgG8DVwEXAZ9vYRc8hSRqfkQIkyXrgPwDfbOsBrgDub0N2ANe2
5S1tnbb9k238FuDeqvplVf0MmAEua4+ZqvppVf0KuBfY0jmHJGlMRj0C+SvgT4F/aesfAF6rqrfa
+gFgXVteB7wA0La/3sb///q8fY5W75njHZJsTzKdZHp2dnbEH1WSNIrjBkiSPwQOV9XeMfRzUlXV
HVU1VVVTq1evXup2JOmUsmKEMR8HPpXkGuB9wPuBrwMrk6xoRwDrgYNt/EHgAuBAkhXAOcArQ/U5
w/ssVH+lYw5J0pgc9wikqm6qqvVVtYHBRfAfVtUfAQ8Bn27DtgIPtOWdbZ22/YdVVa1+XbuD6kJg
I/Ao8Biwsd1xdWabY2fbZ7FzSJLGZJQjkKP5M+DeJH8BPA7c2ep3An+bZAY4wiAQqKpnktwHPAu8
BdxQVb8GSPJFYBdwBnBXVT3TM4ckaXxyuvzHfWpqqqanp7v2Xbt24fqhQyfQkCQtA0n2VtXUQtv8
JLokqYsBIknqYoBIkroYIJKkLgaIJKmLASJJ6mKASJK6GCCSpC4GiCSpiwEiSepigEiSuhggkqQu
BogkqYsBIknqYoBIkroYIJKkLgaIJKmLASJJ6mKASJK6GCCSpC4GiCSpiwEiSepigEiSuhggkqQu
xw2QJO9L8miSHyd5Jsmft/qFSR5JMpPkO0nObPX3tvWZtn3D0Gvd1OrPJblqqL651WaS3DhUX/Qc
kqTxGOUI5JfAFVX1EeBiYHOSTcDXgFur6oPAq8C2Nn4b8Gqr39rGkeQi4Drgt4HNwF8nOSPJGcA3
gKuBi4DPtrEsdg5J0vgcN0Bq4J/a6nvao4ArgPtbfQdwbVve0tZp2z+ZJK1+b1X9sqp+BswAl7XH
TFX9tKp+BdwLbGn7LHYOSdKYjHQNpB0pPAEcBnYD/wi8VlVvtSEHgHVteR3wAkDb/jrwgeH6vH2O
Vv9AxxySpDEZKUCq6tdVdTGwnsERw4ff1a5OkiTbk0wnmZ6dnV3qdiTplLKou7Cq6jXgIeBjwMok
K9qm9cDBtnwQuACgbT8HeGW4Pm+fo9Vf6Zhjfr93VNVUVU2tXr16MT+qJOk4RrkLa3WSlW35LOAP
gH0MguTTbdhW4IG2vLOt07b/sKqq1a9rd1BdCGwEHgUeAza2O67OZHChfWfbZ7FzSJLGZMXxh3A+
sKPdLfUbwH1V9WCSZ4F7k/wF8DhwZxt/J/C3SWaAIwwCgap6Jsl9wLPAW8ANVfVrgCRfBHYBZwB3
VdUz7bX+bDFzSJLGJ6fLf9ynpqZqenq6a9+1axeuHzp0Ag1J0jKQZG9VTS20zU+iS5K6GCCSpC4G
iCSpiwEiSepigEiSuhggkqQuBogkqYsBIknqYoBIkroYIJKkLgaIJKmLASJJ6mKASJK6GCCSpC4G
iCSpiwEiSepigEiSuhggkqQuBogkqYsBIknqYoBIkroYIJKkLgaIJKnLiqVu4FS0du3Rtx06NL4+
JOnd5BGIJKnLcQMkyQVJHkrybJJnknyp1c9NsjvJ/va8qtWT5LYkM0meTHLJ0GttbeP3J9k6VL80
yVNtn9uSpHcOSdJ4jHIE8hbwJ1V1EbAJuCHJRcCNwJ6q2gjsaesAVwMb22M7cDsMwgC4GbgcuAy4
eS4Q2pjPD+23udUXNYckaXyOGyBV9WJV/agt/x9gH7AO2ALsaMN2ANe25S3APTXwMLAyyfnAVcDu
qjpSVa8Cu4HNbdv7q+rhqirgnnmvtZg5JEljsqhrIEk2AB8FHgHWVNWLbdNLwJq2vA54YWi3A612
rPqBBep0zDG/3+1JppNMz87OjvZDSpJGMnKAJPlN4O+AL1fVG8Pb2pFDneTe3qFnjqq6o6qmqmpq
9erV71JnknR6GilAkryHQXh8q6r+vpVfnjtt1J4Pt/pB4IKh3de32rHq6xeo98whSRqTUe7CCnAn
sK+q/nJo005g7k6qrcADQ/Xr251Sm4DX22moXcCVSVa1i+dXArvatjeSbGpzXT/vtRYzhyRpTEb5
IOHHgT8GnkryRKv9J+AW4L4k24Dngc+0bd8DrgFmgDeBzwFU1ZEkXwUea+O+UlVH2vIXgLuBs4Dv
tweLnUOSND4ZXFo49U1NTdX09HTXvkf7ZPnRPlXuJ9ElnSqS7K2qqYW2+Ul0SVIXA0SS1MUAkSR1
MUAkSV0MEElSFwNEktTFAJEkdTFAJEldDBBJUhcDRJLUxQCRJHUZ5csUdRTH+s4rSTrVeQQiSepi
gEiSuhggkqQuBogkqYsBIknqYoBIkroYIJKkLgaIJKmLASJJ6mKASJK6GCCSpC4GiCSpiwEiSepy
3ABJcleSw0meHqqdm2R3kv3teVWrJ8ltSWaSPJnkkqF9trbx+5NsHapfmuSpts9tSdI7hyRpfEY5
Arkb2DyvdiOwp6o2AnvaOsDVwMb22A7cDoMwAG4GLgcuA26eC4Q25vND+23umUOSNF7HDZCq+p/A
kXnlLcCOtrwDuHaofk8NPAysTHI+cBWwu6qOVNWrwG5gc9v2/qp6uKoKuGfeay1mDknSGPVeA1lT
VS+25ZeANW15HfDC0LgDrXas+oEF6j1z/CtJtieZTjI9Ozs74o8mSRrFCV9Eb0cOdRJ6OelzVNUd
VTVVVVOrV69+FzqTpNNXb4C8PHfaqD0fbvWDwAVD49a32rHq6xeo98whSRqj3gDZCczdSbUVeGCo
fn27U2oT8Ho7DbULuDLJqnbx/EpgV9v2RpJN7e6r6+e91mLmkCSN0YrjDUjybeD3gPOSHGBwN9Ut
wH1JtgHPA59pw78HXAPMAG8CnwOoqiNJvgo81sZ9parmLsx/gcGdXmcB328PFjuHJGm8Mri8cOqb
mpqq6enprn3Xrj15fRw6dPJeS5LebUn2VtXUQtv8JLokqctxT2Hp5Frs0YxHLJImlUcgkqQuBogk
qYsBIknqYoBIkrp4EX3CHe2iuxfXJS01j0AkSV0MEElSFwNEktTFAJEkdTFAJEldDBBJUhcDRJLU
xQCRJHUxQCRJXQwQSVIXA0SS1MUAkSR1MUAkSV0MEElSFwNEktTFvweyTPl3QiQtNY9AJEldDBBJ
UpdleworyWbg68AZwDer6pYlbmkiHO3U1tF4yktSr2V5BJLkDOAbwNXARcBnk1y0tF1J0ulluR6B
XAbMVNVPAZLcC2wBnl3SrpahxR6xHI1HMtLpZ7kGyDrghaH1A8Dl8wcl2Q5sb6v/lOS5EV//POAX
J9Th+C1pz8mid/F3PB72/O5bbv3C4nr+d0fbsFwDZCRVdQdwx2L3SzJdVVPvQkvvmuXW83LrF+x5
XJZbz8utXzh5PS/LayDAQeCCofX1rSZJGpPlGiCPARuTXJjkTOA6YOcS9yRJp5VleQqrqt5K8kVg
F4PbeO+qqmdO4hSLPu01AZZbz8utX7DncVluPS+3fuEk9ZyqOhmvI0k6zSzXU1iSpCVmgEiSuhgg
Q5JsTvJckpkkNy51PwtJcleSw0meHqqdm2R3kv3tedVS9jhfkguSPJTk2STPJPlSq09s30nel+TR
JD9uPf95q1+Y5JH2HvlOu4ljYiQ5I8njSR5s65Pe78+TPJXkiSTTrTax7wuAJCuT3J/kJ0n2JfnY
JPec5EPt9zv3eCPJl09GzwZIs4y+HuVuYPO82o3AnqraCOxp65PkLeBPquoiYBNwQ/vdTnLfvwSu
qKqPABcDm5NsAr4G3FpVHwReBbYtYY8L+RKwb2h90vsF+P2qunjocwmT/L6AwXfw/aCqPgx8hMHv
e2J7rqrn2u/3YuBS4E3gu5yMnqvKx+BGgo8Bu4bWbwJuWuq+jtLrBuDpofXngPPb8vnAc0vd43H6
fwD4g+XSN/BvgB8x+LaDXwArFnrPLPWDweeh9gBXAA8CmeR+W08/B86bV5vY9wVwDvAz2g1Iy6Hn
eX1eCfyvk9WzRyBvW+jrUdYtUS+LtaaqXmzLLwFrlrKZY0myAfgo8AgT3nc7HfQEcBjYDfwj8FpV
vdWGTNp75K+APwX+pa1/gMnuF6CAf0iyt331EEz2++JCYBb4m3aq8JtJzmayex52HfDttnzCPRsg
p5ga/HdiIu/NTvKbwN8BX66qN4a3TWLfVfXrGhz2r2fwBZ4fXuKWjirJHwKHq2rvUveySJ+oqksY
nDq+IcnvDm+cwPfFCuAS4Paq+ijwz8w79TOBPQPQrn99Cvhv87f19myAvG05fz3Ky0nOB2jPh5e4
n38lyXsYhMe3qurvW3ni+waoqteAhxicAlqZZO4DuJP0Hvk48KkkPwfuZXAa6+tMbr8AVNXB9nyY
wXn5y5js98UB4EBVPdLW72cQKJPc85yrgR9V1ctt/YR7NkDetpy/HmUnsLUtb2VwjWFiJAlwJ7Cv
qv5yaNPE9p1kdZKVbfksBtds9jEIkk+3YRPTc1XdVFXrq2oDg/fuD6vqj5jQfgGSnJ3kt+aWGZyf
f5oJfl9U1UvAC0k+1EqfZPBnJCa25yGf5e3TV3Ayel7qizqT9ACuAf43g3Pd/3mp+zlKj98GXgT+
L4P/DW1jcK57D7Af+B/AuUvd57yeP8Hg8PhJ4In2uGaS+wZ+B3i89fw08F9a/d8DjwIzDE4FvHep
e12g998DHpz0fltvP26PZ+b+zU3y+6L1dzEw3d4b/x1YtQx6Pht4BThnqHbCPftVJpKkLp7CkiR1
MUAkSV0MEElSFwNEktTFAJEkdTFAJEldDBBJUpf/BxrCEguPRLjZAAAAAElFTkSuQmCC
" />
</div>
</div>
<div class="output_area">
<div class="output_subarea output_stream output_stdout output_text">
<pre>18.54957629991187 10 50.99032748692644
</pre>
</div>
</div>
<div class="output_area">
<div class="output_png output_subarea ">
<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAYQAAAD4CAYAAADsKpHdAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz
AAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0
dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAVGElEQVR4nO3db4ydZ3nn8e+vMYEsbbBNvJb/RGuj
WlTpi4bkKHFEVXVhcZy0wnmBUKJq7c1m8WoDK9hdqZtsX0SFvoDVqhRradqIUBxECWkKGytK6vWa
SPsqIeNC85esJ6HZ2E7iAefPtkjQ0GtfnMtwcMb2sT2eGc98P9LRuZ/rvp/n3Pc8Jr85z3nOkKpC
kqRfmOsJSJLmBwNBkgQYCJKkZiBIkgADQZLUlsz1BE7XRRddVOvWrZvraUjSOWPfvn3fr6oVx+s/
ZwNh3bp1TExMzPU0JOmckeT5E/V7yUiSBBgIkqRmIEiSAANBktQMBEkSYCBIkpqBIEkCDARJUjMQ
JEnAOfxN5TOxevX09UOHZncekjSf+A5BkgQYCJKkZiBIkgADQZLUDARJEjBGICR5d5LvjDxeT/KJ
JMuT7Emyv5+X9fgk2ZFkMsljSS4bOda2Hr8/ybaR+uVJHu99diTJ2VmuJOl4ThoIVfVMVV1aVZcC
lwM/BL4B3ALsraoNwN7eBrgG2NCP7cDtAEmWA7cBVwJXALcdDZEe85GR/TbPyOokSWM71UtG7wee
rarngS3Azq7vBK7r9hbgrhp6GFiaZBVwNbCnqo5U1SvAHmBz911YVQ9XVQF3jRxLkjRLTjUQrge+
2u2VVfVit18CVnZ7DfDCyD4Hunai+oFp6pKkWTR2ICQ5H/gg8BfH9vVv9jWD8zreHLYnmUgyMTU1
dbZfTpIWlVN5h3AN8NdV9XJvv9yXe+jnw10/CFw8st/arp2ovnaa+ptU1R1VNaiqwYoVK05h6pKk
kzmVQLiBn10uAtgFHL1TaBtw30h9a99ttBF4rS8t7QY2JVnWHyZvAnZ33+tJNvbdRVtHjiVJmiVj
/XG7JG8HPgD825Hyp4F7ktwEPA98uOsPANcCkwzvSLoRoKqOJPkU8GiP+2RVHen2zcCXgAuAB/sh
SZpFGV7+P/cMBoOamJg4rX39a6eSFqMk+6pqcLx+v6ksSQIMBElSMxAkSYCBIElqBoIkCTAQJEnN
QJAkAQaCJKkZCJIkwECQJDUDQZIEGAiSpGYgSJIAA0GS1AwESRJgIEiSmoEgSQIMBElSMxAkSYCB
IElqYwVCkqVJ7k3y3SRPJ7kqyfIke5Ls7+dlPTZJdiSZTPJYkstGjrOtx+9Psm2kfnmSx3ufHUky
80uVJJ3IuO8QPgf8VVX9CvBrwNPALcDeqtoA7O1tgGuADf3YDtwOkGQ5cBtwJXAFcNvREOkxHxnZ
b/OZLUuSdKpOGghJ3gH8BnAnQFX9uKpeBbYAO3vYTuC6bm8B7qqhh4GlSVYBVwN7qupIVb0C7AE2
d9+FVfVwVRVw18ixJEmzZJx3COuBKeDPknw7yReSvB1YWVUv9piXgJXdXgO8MLL/ga6dqH5gmvqb
JNmeZCLJxNTU1BhTlySNa5xAWAJcBtxeVe8B/p6fXR4CoH+zr5mf3s+rqjuqalBVgxUrVpztl5Ok
RWWcQDgAHKiqR3r7XoYB8XJf7qGfD3f/QeDikf3Xdu1E9bXT1CVJs+ikgVBVLwEvJHl3l94PPAXs
Ao7eKbQNuK/bu4CtfbfRRuC1vrS0G9iUZFl/mLwJ2N19ryfZ2HcXbR05liRpliwZc9y/B76S5Hzg
OeBGhmFyT5KbgOeBD/fYB4BrgUnghz2WqjqS5FPAoz3uk1V1pNs3A18CLgAe7IckaRZlePn/3DMY
DGpiYuK09l29evr6oUNnMCFJmueS7KuqwfH6/aayJAkwECRJzUCQJAEGgiSpGQiSJMBAkCQ1A0GS
BBgIkqRmIEiSAANBktQMBEkSYCBIkpqBIEkCDARJUjMQJEmAgSBJagaCJAkwECRJzUCQJAFjBkKS
v03yeJLvJJno2vIke5Ls7+dlXU+SHUkmkzyW5LKR42zr8fuTbBupX97Hn+x9M9MLlSSd2Km8Q/jn
VXXpyP9B8y3A3qraAOztbYBrgA392A7cDsMAAW4DrgSuAG47GiI95iMj+20+7RVJkk7LmVwy2gLs
7PZO4LqR+l019DCwNMkq4GpgT1UdqapXgD3A5u67sKoerqoC7ho5liRplowbCAX8zyT7kmzv2sqq
erHbLwEru70GeGFk3wNdO1H9wDT1N0myPclEkompqakxpy5JGseSMcf9elUdTPJPgT1JvjvaWVWV
pGZ+ej+vqu4A7gAYDAZn/fUkaTEZ6x1CVR3s58PANxh+BvByX+6hnw/38IPAxSO7r+3aieprp6lL
kmbRSQMhyduT/NLRNrAJeALYBRy9U2gbcF+3dwFb+26jjcBrfWlpN7ApybL+MHkTsLv7Xk+yse8u
2jpyLEnSLBnnktFK4Bt9J+gS4M+r6q+SPArck+Qm4Hngwz3+AeBaYBL4IXAjQFUdSfIp4NEe98mq
OtLtm4EvARcAD/ZDkjSLMryx59wzGAxqYmLitPZdvXr6+qFDZzAhSZrnkuwb+erAm/hNZUkSYCBI
kpqBIEkCDARJUjMQJEmAgSBJagaCJAkwECRJzUCQJAEGgiSpGQiSJMBAkCQ1A0GSBBgIkqRmIEiS
AANBktQMBEkSYCBIkpqBIEkCTiEQkpyX5NtJ7u/t9UkeSTKZ5GtJzu/6W3t7svvXjRzj1q4/k+Tq
kfrmrk0muWXmlidJGtepvEP4OPD0yPZngM9W1S8DrwA3df0m4JWuf7bHkeQS4HrgV4HNwB93yJwH
fB64BrgEuKHHSpJm0ViBkGQt8FvAF3o7wPuAe3vITuC6bm/pbbr//T1+C3B3Vf2oqr4HTAJX9GOy
qp6rqh8Dd/dYSdIsGvcdwh8Bvwv8Y2+/E3i1qt7o7QPAmm6vAV4A6P7XevxP68fsc7z6myTZnmQi
ycTU1NSYU5ckjeOkgZDkt4HDVbVvFuZzQlV1R1UNqmqwYsWKuZ6OJC0oS8YY817gg0muBd4GXAh8
DliaZEm/C1gLHOzxB4GLgQNJlgDvAH4wUj9qdJ/j1SVJs+Sk7xCq6taqWltV6xh+KPzNqvod4CHg
Qz1sG3Bft3f1Nt3/zaqqrl/fdyGtBzYA3wIeBTb0XUvn92vsmpHVSZLGNs47hOP5z8DdSf4A+DZw
Z9fvBL6cZBI4wvA/8FTVk0nuAZ4C3gA+WlU/AUjyMWA3cB7wxap68gzmJUk6DRn+8n7uGQwGNTEx
cVr7rl49ff3QoTOYkCTNc0n2VdXgeP1+U1mSBBgIkqRmIEiSAANBktQMBEkSYCBIkpqBIEkCDARJ
UjMQJEmAgSBJagaCJAkwECRJzUCQJAEGgiSpGQiSJMBAkCQ1A0GSBBgIkqRmIEiSgDECIcnbknwr
yd8keTLJ73d9fZJHkkwm+VqS87v+1t6e7P51I8e6tevPJLl6pL65a5NJbpn5ZUqSTmacdwg/At5X
Vb8GXApsTrIR+Azw2ar6ZeAV4KYefxPwStc/2+NIcglwPfCrwGbgj5Ocl+Q84PPANcAlwA09VpI0
i04aCDX0d735ln4U8D7g3q7vBK7r9pbepvvfnyRdv7uqflRV3wMmgSv6MVlVz1XVj4G7e6wkaRaN
9RlC/yb/HeAwsAd4Fni1qt7oIQeANd1eA7wA0P2vAe8crR+zz/Hq081je5KJJBNTU1PjTF2SNKax
AqGqflJVlwJrGf5G/ytndVbHn8cdVTWoqsGKFSvmYgqStGCd0l1GVfUq8BBwFbA0yZLuWgsc7PZB
4GKA7n8H8IPR+jH7HK8uSZpF49xltCLJ0m5fAHwAeJphMHyoh20D7uv2rt6m+79ZVdX16/supPXA
BuBbwKPAhr5r6XyGHzzvmonFSZLGt+TkQ1gF7Oy7gX4BuKeq7k/yFHB3kj8Avg3c2ePvBL6cZBI4
wvA/8FTVk0nuAZ4C3gA+WlU/AUjyMWA3cB7wxap6csZWKEkaS4a/vJ97BoNBTUxMnNa+q1dPXz90
6AwmJEnzXJJ9VTU4Xr/fVJYkAQaCJKkZCJIkwECQJDUDQZIEGAiSpGYgSJIAA0GS1AwESRJgIEiS
moEgSQIMBElSMxAkSYCBIElqBoIkCTAQJEnNQJAkAQaCJKkZCJIkYIxASHJxkoeSPJXkySQf7/ry
JHuS7O/nZV1Pkh1JJpM8luSykWNt6/H7k2wbqV+e5PHeZ0eSnI3FSpKOb5x3CG8A/6mqLgE2Ah9N
cglwC7C3qjYAe3sb4BpgQz+2A7fDMECA24ArgSuA246GSI/5yMh+m898aZKkU3HSQKiqF6vqr7v9
/4CngTXAFmBnD9sJXNftLcBdNfQwsDTJKuBqYE9VHamqV4A9wObuu7CqHq6qAu4aOZYkaZac0mcI
SdYB7wEeAVZW1Yvd9RKwsttrgBdGdjvQtRPVD0xTn+71tyeZSDIxNTV1KlOXJJ3E2IGQ5BeBvwQ+
UVWvj/b1b/Y1w3N7k6q6o6oGVTVYsWLF2X45SVpUxgqEJG9hGAZfqaqvd/nlvtxDPx/u+kHg4pHd
13btRPW109QlSbNonLuMAtwJPF1VfzjStQs4eqfQNuC+kfrWvttoI/BaX1raDWxKsqw/TN4E7O6+
15Ns7NfaOnIsSdIsWTLGmPcC/xJ4PMl3uvZfgE8D9yS5CXge+HD3PQBcC0wCPwRuBKiqI0k+BTza
4z5ZVUe6fTPwJeAC4MF+SJJmUYaX/889g8GgJiYmTmvf1aunrx86dAYTkqR5Lsm+qhocr99vKkuS
AANBktQMBEkSYCBIkpqBIEkCDARJUjMQJEmAgSBJagaCJAkwECRJzUCQJAEGgiSpGQiSJMBAkCQ1
A0GSBBgIkqRmIEiSAANBktQMBEkSMEYgJPliksNJnhipLU+yJ8n+fl7W9STZkWQyyWNJLhvZZ1uP
359k20j98iSP9z47kmSmFylJOrlx3iF8Cdh8TO0WYG9VbQD29jbANcCGfmwHbodhgAC3AVcCVwC3
HQ2RHvORkf2OfS1J0iw4aSBU1f8GjhxT3gLs7PZO4LqR+l019DCwNMkq4GpgT1UdqapXgD3A5u67
sKoerqoC7ho5liRpFp3uZwgrq+rFbr8ErOz2GuCFkXEHunai+oFp6pKkWXbGHyr3b/Y1A3M5qSTb
k0wkmZiampqNl5SkReN0A+HlvtxDPx/u+kHg4pFxa7t2ovraaerTqqo7qmpQVYMVK1ac5tQlSdM5
3UDYBRy9U2gbcN9IfWvfbbQReK0vLe0GNiVZ1h8mbwJ2d9/rSTb23UVbR44lSZpFS042IMlXgd8E
LkpygOHdQp8G7klyE/A88OEe/gBwLTAJ/BC4EaCqjiT5FPBoj/tkVR39oPpmhncyXQA82A9J0izL
8COAc89gMKiJiYnT2nf16unrhw6dwYQkaZ5Lsq+qBsfr95vKkiTAQJAkNQNBkgQYCJKkZiBIkgAD
QZLUDARJEmAgSJKagSBJAgwESVIzECRJgIEgSWoGgiQJMBAkSc1AkCQBBoIkqRkIkiTAQJAkNQNB
kgQYCJKkNm8CIcnmJM8kmUxyy1zPR5IWm3kRCEnOAz4PXANcAtyQ5JK5nZUkLS5L5noC7Qpgsqqe
A0hyN7AFeGo2J7F69fT1Q4dmcxaSNDfmSyCsAV4Y2T4AXHnsoCTbge29+XdJnjnN17sI+P64g5PT
fJX55ZTWvEC45sXBNY/vn52oc74Ewliq6g7gjjM9TpKJqhrMwJTOGa55cXDNi8PZWvO8+AwBOAhc
PLK9tmuSpFkyXwLhUWBDkvVJzgeuB3bN8ZwkaVGZF5eMquqNJB8DdgPnAV+sqifP4kue8WWnc5Br
Xhxc8+JwVtacqjobx5UknWPmyyUjSdIcMxAkScAiC4SF9Ocxklyc5KEkTyV5MsnHu748yZ4k+/t5
WdeTZEev/bEkl40ca1uP359k21ytaVxJzkvy7ST39/b6JI/02r7WNyaQ5K29Pdn960aOcWvXn0ly
9dysZDxJlia5N8l3kzyd5KqFfp6T/If+d/1Ekq8medtCO89JvpjkcJInRmozdl6TXJ7k8d5nRzLG
N6qqalE8GH5Y/SzwLuB84G+AS+Z6XmewnlXAZd3+JeD/MPyzH/8VuKXrtwCf6fa1wINAgI3AI11f
DjzXz8u6vWyu13eStf9H4M+B+3v7HuD6bv8J8O+6fTPwJ92+Hvhaty/p8/9WYH3/uzhvrtd1gvXu
BP5Nt88Hli7k88zwi6rfAy4YOb//aqGdZ+A3gMuAJ0ZqM3ZegW/12PS+15x0TnP9Q5nFH/5VwO6R
7VuBW+d6XjO4vvuADwDPAKu6tgp4ptt/CtwwMv6Z7r8B+NOR+s+Nm28Pht9R2Qu8D7i//7F/H1hy
7HlmeNfaVd1e0uNy7LkfHTffHsA7+j+OOaa+YM8zP/vLBcv7vN0PXL0QzzOw7phAmJHz2n3fHan/
3LjjPRbTJaPp/jzGmjmay4zqt8jvAR4BVlbVi931ErCy28db/7n2c/kj4HeBf+ztdwKvVtUbvT06
/5+urftf6/Hn0prXA1PAn/Vlsi8keTsL+DxX1UHgvwH/F3iR4Xnbx8I+z0fN1Hld0+1j6ye0mAJh
QUryi8BfAp+oqtdH+2r4q8GCua84yW8Dh6tq31zPZRYtYXhZ4faqeg/w9wwvJfzUAjzPyxj+ccv1
wGrg7cDmOZ3UHJiL87qYAmHB/XmMJG9hGAZfqaqvd/nlJKu6fxVwuOvHW/+59HN5L/DBJH8L3M3w
stHngKVJjn7JcnT+P11b978D+AHn1poPAAeq6pHevpdhQCzk8/wvgO9V1VRV/QPwdYbnfiGf56Nm
6rwe7Pax9RNaTIGwoP48Rt8xcCfwdFX94UjXLuDonQbbGH62cLS+te9W2Ai81m9NdwObkizr38w2
dW3eqapbq2ptVa1jeP6+WVW/AzwEfKiHHbvmoz+LD/X46vr1fXfKemADww/g5p2qegl4Icm7u/R+
hn8WfsGeZ4aXijYm+Sf97/zomhfseR4xI+e1+15PsrF/hltHjnV8c/2hyix/gHMtw7txngV+b67n
c4Zr+XWGbycfA77Tj2sZXjvdC+wH/hewvMeH4f8J0bPA48Bg5Fj/Gpjsx41zvbYx1/+b/Owuo3cx
/B/6JPAXwFu7/rbenuz+d43s/3v9s3iGMe6+mOO1XgpM9Ln+HwzvJlnQ5xn4feC7wBPAlxneKbSg
zjPwVYafkfwDw3eCN83keQUG/fN7FvjvHHNjwnQP/3SFJAlYXJeMJEknYCBIkgADQZLUDARJEmAg
SJKagSBJAgwESVL7/27i0eAlIZFDAAAAAElFTkSuQmCC
" />
</div>
</div>
<div class="output_area">
<div class="output_subarea output_stream output_stdout output_text">
<pre>3.57512896650546 3.0 2.605938655784157
</pre>
</div>
</div>
<div class="output_area">
<div class="output_png output_subarea ">
<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAD4CAYAAADCb7BPAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz
AAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0
dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAASW0lEQVR4nO3df6zddX3H8edrVBTZpAVuCG3J2sVG
05mpeIM1mmWTDQozlj+MwZjRmMb+IW46l2xlS0am/2iyyCRREiIoJEZk6EZj1K6rJMuWgNyqkx+V
ceeP0RbolfJjm4kOfe+P86k7XO+90M8t55zbPh/Jyfl+39/P93ze9x701e+Pc26qCkmSjtevjLsB
SdLKZIBIkroYIJKkLgaIJKmLASJJ6rJq3A2MyrnnnlsbNmwYdxuStKLs37//R1U1tdC2UyZANmzY
wMzMzLjbkKQVJckPF9vmKSxJUhcDRJLUxQCRJHUxQCRJXQwQSVIXA0SS1MUAkSR1MUAkSV0MEElS
l1Pmk+jLsXbtwvXDh0fbhyRNEo9AJEldDBBJUhcDRJLUxQCRJHUxQCRJXQwQSVIXA0SS1MUAkSR1
MUAkSV0MEElSFwNEktTFAJEkdTFAJEldDBBJUpfnDZAkNyc5kuT+odrZSfYmebg9r2n1JLk+yWyS
7yS5cGif7W38w0m2D9XfkOS+ts/1SdI7hyRpdF7IEchnga3zaruAfVW1CdjX1gEuAza1x07gBhiE
AXAt8EbgIuDaY4HQxrx3aL+tPXNIkkbreQOkqv4ZODqvvA24pS3fAlwxVL+1Bu4GVic5H7gU2FtV
R6vqSWAvsLVte0VV3V1VBdw677WOZw5J0gj1XgM5r6oebcuPAee15XXAI0PjDrbaUvWDC9R75vgl
SXYmmUkyMzc39wJ/NEnSC7Hsi+jtyKFOQC8nfI6qurGqpqtqempq6kXoTJJOXb0B8vix00bt+Uir
HwIuGBq3vtWWqq9foN4zhyRphHoDZDdw7E6q7cCdQ/Wr2p1SW4Cn22moPcAlSda0i+eXAHvatmeS
bGl3X10177WOZw5J0giter4BST4P/A5wbpKDDO6m+ihwe5IdwA+Bd7bhXwEuB2aBHwPvAaiqo0k+
Atzbxn24qo5dmH8fgzu9zgC+2h4c7xySpNHK4PLCyW96erpmZma69l27duH64cPLaEiSVoAk+6tq
eqFtfhJdktTFAJEkdTFAJEldDBBJUhcDRJLUxQCRJHUxQCRJXQwQSVIXA0SS1MUAkSR1MUAkSV0M
EElSFwNEktTFAJEkdTFAJEldDBBJUhcDRJLUxQCRJHUxQCRJXQwQSVIXA0SS1MUAkSR1MUAkSV0M
EElSFwNEktTFAJEkdTFAJEldDBBJUhcDRJLUZVkBkuRPkjyQ5P4kn0/ysiQbk9yTZDbJF5Kc3sa+
tK3Ptu0bhl7nmlZ/KMmlQ/WtrTabZNdQfcE5JEmj0x0gSdYBfwxMV9VrgNOAK4GPAddV1SuBJ4Ed
bZcdwJOtfl0bR5LNbb/fBLYCn0pyWpLTgE8ClwGbgXe1sSwxhyRpRJZ7CmsVcEaSVcDLgUeBtwJ3
tO23AFe05W1tnbb94iRp9duq6idV9X1gFrioPWar6ntV9VPgNmBb22exOSRJI9IdIFV1CPgb4D8Z
BMfTwH7gqap6tg07CKxry+uAR9q+z7bx5wzX5+2zWP2cJeZ4jiQ7k8wkmZmbm+v9USVJC1jOKaw1
DI4eNgJrgTMZnIKaGFV1Y1VNV9X01NTUuNuRpJPKck5h/R7w/aqaq6r/Bb4EvBlY3U5pAawHDrXl
Q8AFAG37WcATw/V5+yxWf2KJOSRJI7KcAPlPYEuSl7frEhcDDwJ3Ae9oY7YDd7bl3W2dtv3rVVWt
fmW7S2sjsAn4BnAvsKndcXU6gwvtu9s+i80hSRqR5VwDuYfBhexvAve117oR+HPgQ0lmGVyvuKnt
chNwTqt/CNjVXucB4HYG4fM14Oqq+lm7xvF+YA9wALi9jWWJOSRJI5LBP+hPftPT0zUzM9O179q1
C9cPH15GQ5K0AiTZX1XTC23zk+iSpC4GiCSpiwEiSepigEiSuhggkqQuBogkqYsBIknqYoBIkroY
IJKkLgaIJKmLASJJ6mKASJK6GCCSpC4GiCSpiwEiSepigEiSuhggkqQuBogkqYsBIknqYoBIkroY
IJKkLgaIJKmLASJJ6mKASJK6GCCSpC4GiCSpiwEiSepigEiSuiwrQJKsTnJHku8mOZDkTUnOTrI3
ycPteU0bmyTXJ5lN8p0kFw69zvY2/uEk24fqb0hyX9vn+iRp9QXnkCSNznKPQD4BfK2qXg28FjgA
7AL2VdUmYF9bB7gM2NQeO4EbYBAGwLXAG4GLgGuHAuEG4L1D+21t9cXmkCSNSHeAJDkL+G3gJoCq
+mlVPQVsA25pw24BrmjL24Bba+BuYHWS84FLgb1VdbSqngT2AlvbtldU1d1VVcCt815roTkkSSOy
nCOQjcAc8Jkk30ry6SRnAudV1aNtzGPAeW15HfDI0P4HW22p+sEF6iwxhyRpRJYTIKuAC4Ebqur1
wP8w71RSO3KoZczxvJaaI8nOJDNJZubm5l7MNiTplLOcADkIHKyqe9r6HQwC5fF2+on2fKRtPwRc
MLT/+lZbqr5+gTpLzPEcVXVjVU1X1fTU1FTXDylJWlh3gFTVY8AjSV7VShcDDwK7gWN3Um0H7mzL
u4Gr2t1YW4Cn22moPcAlSda0i+eXAHvatmeSbGl3X10177UWmkOSNCKrlrn/HwGfS3I68D3gPQxC
6fYkO4AfAu9sY78CXA7MAj9uY6mqo0k+Atzbxn24qo625fcBnwXOAL7aHgAfXWSOibB27eLbDh8e
XR+S9GLK4BLCyW96erpmZma69l0sEBYLAwNE0skiyf6qml5om59ElyR1MUAkSV0MEElSFwNEktTF
AJEkdTFAJEldDBBJUhcDRJLUxQCRJHUxQCRJXQwQSVIXA0SS1MUAkSR1MUAkSV0MEElSFwNEktTF
AJEkdTFAJEldDBBJUhcDRJLUxQCRJHUxQCRJXQwQSVIXA0SS1MUAkSR1MUAkSV0MEElSFwNEktTF
AJEkdVl2gCQ5Lcm3kny5rW9Mck+S2SRfSHJ6q7+0rc+27RuGXuOaVn8oyaVD9a2tNptk11B9wTkk
SaNzIo5APgAcGFr/GHBdVb0SeBLY0eo7gCdb/bo2jiSbgSuB3wS2Ap9qoXQa8EngMmAz8K42dqk5
JEkjsqwASbIe+APg0209wFuBO9qQW4Ar2vK2tk7bfnEbvw24rap+UlXfB2aBi9pjtqq+V1U/BW4D
tj3PHJKkEVnuEcjfAn8G/LytnwM8VVXPtvWDwLq2vA54BKBtf7qN/0V93j6L1Zea4zmS7Ewyk2Rm
bm6u92eUJC2gO0CSvA04UlX7T2A/J1RV3VhV01U1PTU1Ne52JOmksmoZ+74ZeHuSy4GXAa8APgGs
TrKqHSGsBw618YeAC4CDSVYBZwFPDNWPGd5nofoTS8whSRqR7iOQqrqmqtZX1QYGF8G/XlXvBu4C
3tGGbQfubMu72zpt+9erqlr9ynaX1kZgE/AN4F5gU7vj6vQ2x+62z2JzSJJG5MX4HMifAx9KMsvg
esVNrX4TcE6rfwjYBVBVDwC3Aw8CXwOurqqftaOL9wN7GNzldXsbu9QckqQRyeAf9Ce/6enpmpmZ
6dp37dqF64cPH9/4pfaRpEmUZH9VTS+0zU+iS5K6GCCSpC4GiCSpiwEiSepigEiSuhggkqQuBogk
qYsBIknqYoBIkros58sUT3lLfeJckk52HoFIkroYIJKkLgaIJKmLASJJ6mKASJK6GCCSpC4GiCSp
iwEiSepigEiSuhggkqQuBogkqYsBIknqYoBIkroYIJKkLgaIJKmLfw9kQiz2t0UOHx5tH5L0QnkE
IknqYoBIkroYIJKkLt0BkuSCJHcleTDJA0k+0OpnJ9mb5OH2vKbVk+T6JLNJvpPkwqHX2t7GP5xk
+1D9DUnua/tcnyRLzSFJGp3lHIE8C/xpVW0GtgBXJ9kM7AL2VdUmYF9bB7gM2NQeO4EbYBAGwLXA
G4GLgGuHAuEG4L1D+21t9cXmkCSNSHeAVNWjVfXNtvxfwAFgHbANuKUNuwW4oi1vA26tgbuB1UnO
By4F9lbV0ap6EtgLbG3bXlFVd1dVAbfOe62F5pAkjcgJuQaSZAPweuAe4LyqerRtegw4ry2vAx4Z
2u1gqy1VP7hAnSXmmN/XziQzSWbm5uaO/weTJC1q2QGS5FeBLwIfrKpnhre1I4da7hxLWWqOqrqx
qqaranpqaurFbEOSTjnLCpAkL2EQHp+rqi+18uPt9BPt+UirHwIuGNp9fastVV+/QH2pOSRJI7Kc
u7AC3AQcqKqPD23aDRy7k2o7cOdQ/ap2N9YW4Ol2GmoPcEmSNe3i+SXAnrbtmSRb2lxXzXutheaQ
JI3Icr7K5M3AHwL3Jfl2q/0F8FHg9iQ7gB8C72zbvgJcDswCPwbeA1BVR5N8BLi3jftwVR1ty+8D
PgucAXy1PVhiDknSiHQHSFX9C5BFNl+8wPgCrl7ktW4Gbl6gPgO8ZoH6EwvNIUkaHT+JLknqYoBI
kroYIJKkLv49kBFb7O9+SNJK4xGIJKmLASJJ6mKASJK6GCCSpC4GiCSpiwEiSepigEiSuhggkqQu
BogkqYsBIknqYoBIkroYIJKkLgaIJKmLASJJ6mKASJK6GCCSpC7+QakJd7x/gOrw4RenD0mazyMQ
SVIXA0SS1MUAkSR1MUAkSV0MEElSFwNEktTFAJEkdTFAJEldVmyAJNma5KEks0l2jbsfSTrVrMhP
oic5Dfgk8PvAQeDeJLur6sHxdjZ+i31y3U+oSzrRVmSAABcBs1X1PYAktwHbgFM+QBZzvF+JshiD
SNIxKzVA1gGPDK0fBN44f1CSncDOtvrfSR56ga9/LvCjZXU4GiPvM+nabSX8PldCj2CfJ9JK6BHG
3+evL7ZhpQbIC1JVNwI3Hu9+SWaqavpFaOmEss8TZyX0CPZ5Iq2EHmGy+1ypF9EPARcMra9vNUnS
iKzUALkX2JRkY5LTgSuB3WPuSZJOKSvyFFZVPZvk/cAe4DTg5qp64AROcdynvcbEPk+cldAj2OeJ
tBJ6hAnuM1U17h4kSSvQSj2FJUkaMwNEktTFAJlnUr8iJcnNSY4kuX+odnaSvUkebs9rxtzjBUnu
SvJgkgeSfGBC+3xZkm8k+bfW51+3+sYk97T3/gvtBo2xSnJakm8l+fIE9/iDJPcl+XaSmVabqPe8
9bQ6yR1JvpvkQJI3TVKfSV7VfofHHs8k+eAk9TifATJk6CtSLgM2A+9Ksnm8Xf3CZ4Gt82q7gH1V
tQnY19bH6VngT6tqM7AFuLr9/iatz58Ab62q1wKvA7Ym2QJ8DLiuql4JPAnsGGOPx3wAODC0Pok9
AvxuVb1u6PMKk/aeA3wC+FpVvRp4LYPf68T0WVUPtd/h64A3AD8G/n6SevwlVeWjPYA3AXuG1q8B
rhl3X0P9bADuH1p/CDi/LZ8PPDTuHuf1eyeD7yub2D6BlwPfZPBNBj8CVi3038KYelvP4P8w3gp8
Gcik9dj6+AFw7rzaRL3nwFnA92k3Dk1qn0N9XQL86yT3WFUegcyz0FekrBtTLy/EeVX1aFt+DDhv
nM0MS7IBeD1wDxPYZzs19G3gCLAX+A/gqap6tg2ZhPf+b4E/A37e1s9h8noEKOAfk+xvXx8Ek/ee
bwTmgM+0U4KfTnImk9fnMVcCn2/Lk9qjAXKyqME/Tybinuwkvwp8EfhgVT0zvG1S+qyqn9XgVMF6
Bl/O+eoxt/QcSd4GHKmq/ePu5QV4S1VdyODU79VJfnt444S856uAC4Ebqur1wP8w71TQhPRJu671
duDv5m+blB6PMUCea6V9RcrjSc4HaM9HxtwPSV7CIDw+V1VfauWJ6/OYqnoKuIvB6aDVSY59uHbc
7/2bgbcn+QFwG4PTWJ9gsnoEoKoOtecjDM7ZX8TkvecHgYNVdU9bv4NBoExanzAI4m9W1eNtfRJ7
BAyQ+VbaV6TsBra35e0MrjmMTZIANwEHqurjQ5smrc+pJKvb8hkMrtMcYBAk72jDxtpnVV1TVeur
agOD/w6/XlXvZoJ6BEhyZpJfO7bM4Nz9/UzYe15VjwGPJHlVK13M4M8/TFSfzbv4/9NXMJk9Doz7
IsykPYDLgX9ncE78L8fdz1BfnwceBf6Xwb+mdjA4J74PeBj4J+DsMff4FgaH198Bvt0el09gn78F
fKv1eT/wV63+G8A3gFkGpw9eOu73vfX1O8CXJ7HH1s+/tccDx/43M2nveevpdcBMe9//AVgzaX0C
ZwJPAGcN1Saqx+GHX2UiSeriKSxJUhcDRJLUxQCRJHUxQCRJXQwQSVIXA0SS1MUAkSR1+T+VmtYx
RwH1zAAAAABJRU5ErkJggg==
" />
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>As you can see, there are HTML elements left with < and > occurring quite often in your comments dataset that may make it harder for your model to learn to generate the comments that contain those elements. Luckily for us, it won't really affect your models accuracy, but exploring your data like this does allow us to see how your data may be influencing your model.</p>
<p><strong>TODO For You:</strong> Perform some further cleaning steps to remove HTML and any other cleaning you deem necessary and see how your performance changes.</p>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h2 id="Loading-the-data-using-FastAI">Loading the data using FastAI<a class="anchor-link" href="#Loading-the-data-using-FastAI"> </a></h2><p>Now that you have the data processed and cleaned you need a way to get it into the format that FastAI uses. To do that you will use some code from Rachel Thomas' awesome <a href="https://github.com/fastai/course-nlp/blob/master/7-seq2seq-translation.ipynb">course on NLP</a>, which allows us to create a Sequence to Sequence (since you are going from the sequence of code to the sequence of the code's docstring) DataBunch (this is just the format FastAI uses for managing loading the data into memory for training and evaluating).</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<details class="description" open="">
<summary class="btn btn-sm" data-open="Hide Code" data-close="Show Code"></summary>
<p><div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="k">def</span> <span class="nf">seq2seq_collate</span><span class="p">(</span><span class="n">samples</span><span class="p">,</span> <span class="n">pad_idx</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">pad_first</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">backwards</span><span class="o">=</span><span class="kc">False</span><span class="p">):</span>
<span class="s2">"Function that collect samples and adds padding. Flips token order if needed"</span>
<span class="n">samples</span> <span class="o">=</span> <span class="n">to_data</span><span class="p">(</span><span class="n">samples</span><span class="p">)</span>
<span class="n">max_len_x</span><span class="p">,</span><span class="n">max_len_y</span> <span class="o">=</span> <span class="nb">max</span><span class="p">([</span><span class="nb">len</span><span class="p">(</span><span class="n">s</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span> <span class="k">for</span> <span class="n">s</span> <span class="ow">in</span> <span class="n">samples</span><span class="p">]),</span><span class="nb">max</span><span class="p">([</span><span class="nb">len</span><span class="p">(</span><span class="n">s</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span> <span class="k">for</span> <span class="n">s</span> <span class="ow">in</span> <span class="n">samples</span><span class="p">])</span>
<span class="n">res_x</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">samples</span><span class="p">),</span> <span class="n">max_len_x</span><span class="p">)</span><span class="o">.</span><span class="n">long</span><span class="p">()</span> <span class="o">+</span> <span class="n">pad_idx</span>
<span class="n">res_y</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">samples</span><span class="p">),</span> <span class="n">max_len_y</span><span class="p">)</span><span class="o">.</span><span class="n">long</span><span class="p">()</span> <span class="o">+</span> <span class="n">pad_idx</span>
<span class="k">if</span> <span class="n">backwards</span><span class="p">:</span> <span class="n">pad_first</span> <span class="o">=</span> <span class="ow">not</span> <span class="n">pad_first</span>
<span class="k">for</span> <span class="n">i</span><span class="p">,</span><span class="n">s</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">samples</span><span class="p">):</span>
<span class="k">if</span> <span class="n">pad_first</span><span class="p">:</span>
<span class="n">res_x</span><span class="p">[</span><span class="n">i</span><span class="p">,</span><span class="o">-</span><span class="nb">len</span><span class="p">(</span><span class="n">s</span><span class="p">[</span><span class="mi">0</span><span class="p">]):],</span><span class="n">res_y</span><span class="p">[</span><span class="n">i</span><span class="p">,</span><span class="o">-</span><span class="nb">len</span><span class="p">(</span><span class="n">s</span><span class="p">[</span><span class="mi">1</span><span class="p">]):]</span> <span class="o">=</span> <span class="n">LongTensor</span><span class="p">(</span><span class="n">s</span><span class="p">[</span><span class="mi">0</span><span class="p">]),</span><span class="n">LongTensor</span><span class="p">(</span><span class="n">s</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">res_x</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="p">:</span><span class="nb">len</span><span class="p">(</span><span class="n">s</span><span class="p">[</span><span class="mi">0</span><span class="p">])],</span><span class="n">res_y</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="p">:</span><span class="nb">len</span><span class="p">(</span><span class="n">s</span><span class="p">[</span><span class="mi">1</span><span class="p">])]</span> <span class="o">=</span> <span class="n">LongTensor</span><span class="p">(</span><span class="n">s</span><span class="p">[</span><span class="mi">0</span><span class="p">]),</span><span class="n">LongTensor</span><span class="p">(</span><span class="n">s</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span>
<span class="k">if</span> <span class="n">backwards</span><span class="p">:</span> <span class="n">res_x</span><span class="p">,</span><span class="n">res_y</span> <span class="o">=</span> <span class="n">res_x</span><span class="o">.</span><span class="n">flip</span><span class="p">(</span><span class="mi">1</span><span class="p">),</span><span class="n">res_y</span><span class="o">.</span><span class="n">flip</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="k">return</span> <span class="n">res_x</span><span class="p">,</span> <span class="n">res_y</span>
<span class="k">class</span> <span class="nc">Seq2SeqDataBunch</span><span class="p">(</span><span class="n">TextDataBunch</span><span class="p">):</span>
<span class="s2">"Create a `TextDataBunch` suitable for training an RNN classifier."</span>
<span class="nd">@classmethod</span>
<span class="k">def</span> <span class="nf">create</span><span class="p">(</span><span class="bp">cls</span><span class="p">,</span> <span class="n">train_ds</span><span class="p">,</span> <span class="n">valid_ds</span><span class="p">,</span> <span class="n">test_ds</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">path</span><span class="o">=</span><span class="s1">'.'</span><span class="p">,</span> <span class="n">bs</span><span class="o">=</span><span class="mi">32</span><span class="p">,</span> <span class="n">val_bs</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">pad_idx</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
<span class="n">dl_tfms</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">pad_first</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span> <span class="n">device</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">no_check</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span> <span class="n">backwards</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span> <span class="o">**</span><span class="n">dl_kwargs</span><span class="p">):</span>
<span class="s2">"Function that transform the `datasets` in a `DataBunch` for classification. Passes `**dl_kwargs` on to `DataLoader()`"</span>
<span class="n">datasets</span> <span class="o">=</span> <span class="bp">cls</span><span class="o">.</span><span class="n">_init_ds</span><span class="p">(</span><span class="n">train_ds</span><span class="p">,</span> <span class="n">valid_ds</span><span class="p">,</span> <span class="n">test_ds</span><span class="p">)</span>
<span class="n">val_bs</span> <span class="o">=</span> <span class="n">ifnone</span><span class="p">(</span><span class="n">val_bs</span><span class="p">,</span> <span class="n">bs</span><span class="p">)</span>
<span class="n">collate_fn</span> <span class="o">=</span> <span class="n">partial</span><span class="p">(</span><span class="n">seq2seq_collate</span><span class="p">,</span> <span class="n">pad_idx</span><span class="o">=</span><span class="n">pad_idx</span><span class="p">,</span> <span class="n">pad_first</span><span class="o">=</span><span class="n">pad_first</span><span class="p">,</span> <span class="n">backwards</span><span class="o">=</span><span class="n">backwards</span><span class="p">)</span>
<span class="n">train_sampler</span> <span class="o">=</span> <span class="n">SortishSampler</span><span class="p">(</span><span class="n">datasets</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">x</span><span class="p">,</span> <span class="n">key</span><span class="o">=</span><span class="k">lambda</span> <span class="n">t</span><span class="p">:</span> <span class="nb">len</span><span class="p">(</span><span class="n">datasets</span><span class="p">[</span><span class="mi">0</span><span class="p">][</span><span class="n">t</span><span class="p">][</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">data</span><span class="p">),</span> <span class="n">bs</span><span class="o">=</span><span class="n">bs</span><span class="o">//</span><span class="mi">2</span><span class="p">)</span>
<span class="n">train_dl</span> <span class="o">=</span> <span class="n">DataLoader</span><span class="p">(</span><span class="n">datasets</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">batch_size</span><span class="o">=</span><span class="n">bs</span><span class="p">,</span> <span class="n">sampler</span><span class="o">=</span><span class="n">train_sampler</span><span class="p">,</span> <span class="n">drop_last</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="o">**</span><span class="n">dl_kwargs</span><span class="p">)</span>
<span class="n">dataloaders</span> <span class="o">=</span> <span class="p">[</span><span class="n">train_dl</span><span class="p">]</span>
<span class="k">for</span> <span class="n">ds</span> <span class="ow">in</span> <span class="n">datasets</span><span class="p">[</span><span class="mi">1</span><span class="p">:]:</span>
<span class="n">lengths</span> <span class="o">=</span> <span class="p">[</span><span class="nb">len</span><span class="p">(</span><span class="n">t</span><span class="p">)</span> <span class="k">for</span> <span class="n">t</span> <span class="ow">in</span> <span class="n">ds</span><span class="o">.</span><span class="n">x</span><span class="o">.</span><span class="n">items</span><span class="p">]</span>
<span class="n">sampler</span> <span class="o">=</span> <span class="n">SortSampler</span><span class="p">(</span><span class="n">ds</span><span class="o">.</span><span class="n">x</span><span class="p">,</span> <span class="n">key</span><span class="o">=</span><span class="n">lengths</span><span class="o">.</span><span class="fm">__getitem__</span><span class="p">)</span>
<span class="n">dataloaders</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">DataLoader</span><span class="p">(</span><span class="n">ds</span><span class="p">,</span> <span class="n">batch_size</span><span class="o">=</span><span class="n">val_bs</span><span class="p">,</span> <span class="n">sampler</span><span class="o">=</span><span class="n">sampler</span><span class="p">,</span> <span class="o">**</span><span class="n">dl_kwargs</span><span class="p">))</span>
<span class="k">return</span> <span class="bp">cls</span><span class="p">(</span><span class="o">*</span><span class="n">dataloaders</span><span class="p">,</span> <span class="n">path</span><span class="o">=</span><span class="n">path</span><span class="p">,</span> <span class="n">device</span><span class="o">=</span><span class="n">device</span><span class="p">,</span> <span class="n">collate_fn</span><span class="o">=</span><span class="n">collate_fn</span><span class="p">,</span> <span class="n">no_check</span><span class="o">=</span><span class="n">no_check</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">Seq2SeqTextList</span><span class="p">(</span><span class="n">TextList</span><span class="p">):</span>
<span class="n">_bunch</span> <span class="o">=</span> <span class="n">Seq2SeqDataBunch</span>
<span class="n">_label_cls</span> <span class="o">=</span> <span class="n">TextList</span>
</pre></div>
</div>
</div>
</div>
</p>
</details>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>Here is where you are telling FastAI to use your trained BPE models for tokenizing your data. FastAI's tokenizers will also do some additional processing of your text such as lower casing all words, removing repetitions, etc. You can find a full list of the processing FastAI uses <a href="https://docs.fast.ai/text.transform.html#Tokenizer">here</a>.</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">method_processor</span> <span class="o">=</span> <span class="n">SPProcessor</span><span class="p">(</span>
<span class="n">sp_model</span> <span class="o">=</span> <span class="n">path</span> <span class="o">/</span> <span class="p">(</span><span class="n">method_tokenizer</span> <span class="o">+</span> <span class="s2">".model"</span><span class="p">),</span>
<span class="n">sp_vocab</span> <span class="o">=</span> <span class="n">path</span> <span class="o">/</span> <span class="p">(</span><span class="n">method_tokenizer</span> <span class="o">+</span> <span class="s2">".vocab"</span><span class="p">),</span>
<span class="n">include_eos</span> <span class="o">=</span> <span class="kc">True</span><span class="p">)</span>
<span class="n">comment_processor</span> <span class="o">=</span> <span class="n">SPProcessor</span><span class="p">(</span>
<span class="n">sp_model</span> <span class="o">=</span> <span class="n">path</span> <span class="o">/</span> <span class="p">(</span><span class="n">comment_tokenizer</span> <span class="o">+</span> <span class="s2">".model"</span><span class="p">),</span>
<span class="n">sp_vocab</span> <span class="o">=</span> <span class="n">path</span> <span class="o">/</span> <span class="p">(</span><span class="n">comment_tokenizer</span> <span class="o">+</span> <span class="s2">".vocab"</span><span class="p">),</span>
<span class="n">include_eos</span> <span class="o">=</span> <span class="kc">True</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>Now that you have your BPE model you will generate the DataBunches suitable for your task, which will be the Seq2Seq DataBunch. You will also filter out sequences that your too long so that you can fit everything onto a Google Colab GPU and to not have your training take too long.</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<details class="description" open="">
<summary class="btn btn-sm" data-open="Hide Code" data-close="Show Code"></summary>
<p><div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="k">def</span> <span class="nf">gen_dbs</span><span class="p">(</span><span class="n">df_trn</span><span class="p">,</span> <span class="n">df_val</span><span class="p">,</span> <span class="n">df_tst</span><span class="p">,</span> <span class="n">method_processor</span><span class="p">,</span> <span class="n">comment_processor</span><span class="p">,</span> <span class="n">bs</span> <span class="o">=</span> <span class="mi">96</span><span class="p">,</span> <span class="n">max_seq</span> <span class="o">=</span> <span class="mi">128</span><span class="p">):</span>
<span class="n">is_valid</span> <span class="o">=</span> <span class="p">[</span><span class="kc">False</span><span class="p">]</span> <span class="o">*</span> <span class="nb">len</span><span class="p">(</span><span class="n">df_trn</span><span class="p">)</span> <span class="o">+</span> <span class="p">[</span><span class="kc">True</span><span class="p">]</span> <span class="o">*</span> <span class="nb">len</span><span class="p">(</span><span class="n">df_val</span><span class="p">)</span>
<span class="n">df_merged</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">concat</span><span class="p">([</span><span class="n">df_trn</span><span class="p">,</span> <span class="n">df_val</span><span class="p">])</span>
<span class="n">df_merged</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">(</span><span class="nb">zip</span><span class="p">(</span><span class="n">df_merged</span><span class="p">[</span><span class="s2">"code"</span><span class="p">]</span><span class="o">.</span><span class="n">to_list</span><span class="p">(),</span> <span class="n">df_merged</span><span class="p">[</span><span class="s2">"docstring"</span><span class="p">]</span><span class="o">.</span><span class="n">to_list</span><span class="p">(),</span> <span class="n">is_valid</span><span class="p">),</span>
<span class="n">columns</span> <span class="o">=</span> <span class="p">[</span><span class="s2">"code"</span><span class="p">,</span> <span class="s2">"docstring"</span><span class="p">,</span> <span class="s2">"valid"</span><span class="p">]</span>
<span class="p">)</span>
<span class="n">db_trn</span> <span class="o">=</span> <span class="p">(</span><span class="n">Seq2SeqTextList</span>
<span class="o">.</span><span class="n">from_df</span><span class="p">(</span><span class="n">df_merged</span><span class="p">,</span> <span class="n">path</span> <span class="o">=</span> <span class="n">path</span><span class="p">,</span> <span class="n">cols</span><span class="o">=</span><span class="s1">'code'</span><span class="p">,</span> <span class="n">processor</span> <span class="o">=</span> <span class="n">method_processor</span><span class="p">)</span>
<span class="o">.</span><span class="n">split_from_df</span><span class="p">(</span><span class="n">col</span><span class="o">=</span><span class="s1">'valid'</span><span class="p">)</span>
<span class="o">.</span><span class="n">label_from_df</span><span class="p">(</span><span class="n">cols</span><span class="o">=</span><span class="s1">'docstring'</span><span class="p">,</span> <span class="n">label_cls</span><span class="o">=</span><span class="n">TextList</span><span class="p">,</span> <span class="n">processor</span> <span class="o">=</span> <span class="n">comment_processor</span><span class="p">)</span>
<span class="o">.</span><span class="n">filter_by_func</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">:</span> <span class="nb">len</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="o">></span> <span class="n">max_seq</span> <span class="ow">or</span> <span class="nb">len</span><span class="p">(</span><span class="n">y</span><span class="p">)</span> <span class="o">></span> <span class="n">max_seq</span><span class="p">)</span>
<span class="o">.</span><span class="n">databunch</span><span class="p">(</span><span class="n">bs</span> <span class="o">=</span> <span class="n">bs</span><span class="p">))</span>
<span class="n">db_tst</span> <span class="o">=</span> <span class="p">(</span><span class="n">Seq2SeqTextList</span>
<span class="o">.</span><span class="n">from_df</span><span class="p">(</span><span class="n">df_tst</span><span class="p">,</span> <span class="n">path</span> <span class="o">=</span> <span class="n">path</span><span class="p">,</span> <span class="n">cols</span><span class="o">=</span><span class="s1">'code'</span><span class="p">,</span> <span class="n">processor</span> <span class="o">=</span> <span class="n">method_processor</span><span class="p">)</span>
<span class="o">.</span><span class="n">split_by_rand_pct</span><span class="p">(</span><span class="n">valid_pct</span> <span class="o">=</span> <span class="mf">0.01</span><span class="p">)</span>
<span class="o">.</span><span class="n">label_from_df</span><span class="p">(</span><span class="n">cols</span><span class="o">=</span><span class="s1">'docstring'</span><span class="p">,</span> <span class="n">label_cls</span><span class="o">=</span><span class="n">TextList</span><span class="p">,</span> <span class="n">processor</span> <span class="o">=</span> <span class="n">comment_processor</span><span class="p">)</span>
<span class="o">.</span><span class="n">filter_by_func</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">:</span> <span class="nb">len</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="o">></span> <span class="n">max_seq</span> <span class="ow">or</span> <span class="nb">len</span><span class="p">(</span><span class="n">y</span><span class="p">)</span> <span class="o">></span> <span class="n">max_seq</span><span class="p">)</span>
<span class="o">.</span><span class="n">databunch</span><span class="p">(</span><span class="n">bs</span> <span class="o">=</span> <span class="mi">16</span><span class="p">))</span>
<span class="k">return</span> <span class="n">db_trn</span><span class="p">,</span> <span class="n">db_tst</span>
<span class="n">db_trn</span><span class="p">,</span> <span class="n">db_tst</span> <span class="o">=</span> <span class="n">gen_dbs</span><span class="p">(</span><span class="n">df_trn</span><span class="p">,</span> <span class="n">df_val</span><span class="p">,</span> <span class="n">df_tst</span><span class="p">,</span> <span class="n">method_processor</span><span class="p">,</span> <span class="n">comment_processor</span><span class="p">,</span> <span class="n">bs</span> <span class="o">=</span> <span class="mi">96</span><span class="p">,</span> <span class="n">max_seq</span> <span class="o">=</span> <span class="mi">128</span><span class="p">)</span>
<span class="n">db_trn</span><span class="o">.</span><span class="n">show_batch</span><span class="p">()</span>
</pre></div>
</div>
</div>
</div>
</p>
</details>
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_html rendered_html output_subarea ">
</div>
</div>
<div class="output_area">
<div class="output_html rendered_html output_subarea ">
</div>
</div>
<div class="output_area">
<div class="output_html rendered_html output_subarea ">
</div>
</div>
<div class="output_area">
<div class="output_html rendered_html output_subarea ">
</div>
</div>
<div class="output_area">
<div class="output_html rendered_html output_subarea ">
</div>
</div>
<div class="output_area">
<div class="output_html rendered_html output_subarea ">
</div>
</div>
<div class="output_area">
<div class="output_html rendered_html output_subarea ">
</div>
</div>
<div class="output_area">
<div class="output_html rendered_html output_subarea ">
</div>
</div>
<div class="output_area">
<div class="output_html rendered_html output_subarea ">
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th>text</th>
<th>target</th>
</tr>
</thead>
<tbody>
<tr>
<td>▁ xx b os ▁boolean ▁res er ve ( int ▁column , ▁int ▁size ) ▁{ ▁if ▁( ( column ▁< ▁0) ▁|| ▁( ( column ▁+ ▁size ) ▁> ▁columns )) ▁throw ▁new ▁index out of bound s exception (" res er ve ▁- ▁ inc or rec t ▁column ▁/ ▁size "); ▁for ( int ▁i = column ; ▁i ▁< ▁column ▁+ ▁size ; ▁i ++) ▁{</td>
<td>▁ xx b os ▁ xx ma j ▁re s er ve s ▁a ▁< code > ce ll < ▁/ ▁ xx up ▁code > ▁in ▁the ▁< code > ro w < ▁/ ▁ xx up ▁code > . ▁ xx e os</td>
</tr>
<tr>
<td>▁ xx b os ▁@ help ( ▁ help ▁= ▁" get ▁all ▁the ▁ virtual network function descriptor ▁ xx m a j ▁dependency ▁of ▁a ▁network service descriptor ▁with ▁specific ▁id " ▁) ▁public ▁list < v n f d ep end en cy > ▁get v n f dependencies ( final ▁ xx m a j ▁string ▁id ns d ) ▁throws ▁s d k exception ▁{</td>
<td>▁ xx b os ▁ xx ma j ▁return ▁a ▁ xx ma j ▁list ▁with ▁all ▁the ▁v n f de p end en c ies ▁that ▁are ▁contain ed ▁in ▁a ▁specific ▁network service descriptor . ▁ xx e os</td>
</tr>
<tr>
<td>▁ xx b os ▁@ override ▁public ▁void ▁delete as set and attachment s ( final ▁ xx m a j ▁string ▁as set id ) ▁throws ▁ io exception , ▁request failure exception ▁{ ▁ xx m a j ▁as set ▁as s ▁= ▁get un v er ified as set ( as set id ); ▁list < attachment > ▁attachment s ▁= ▁as s . get attachment s</td>
<td>▁ xx b os ▁ xx ma j ▁this ▁will ▁delete ▁an ▁asset ▁and ▁all ▁its ▁attachments ▁ xx e os</td>
</tr>
<tr>
<td>▁ xx b os ▁public ▁list < character book mark fold er s response > ▁get character s character id book mark s fold er s ( integer ▁character id , ▁ xx m a j ▁string ▁data source , ▁ xx m a j ▁string ▁if n one match , ▁ xx m a j ▁ integer ▁page , ▁ xx m a j ▁string ▁token ) ▁throws ▁api</td>
<td>▁ xx b os ▁ xx ma j ▁list ▁ bookmark ▁folders ▁a ▁list ▁of ▁your ▁character & ' s ▁personal ▁ bookmark ▁folders ▁--- ▁ xx ma j ▁this ▁route ▁is ▁cached ▁for ▁up ▁to ▁36 00 ▁seconds ▁ xx up ▁ s so ▁ xx ma j ▁scope : ▁ esi - bookmark s . read _ character _ bookmark s . v 1 ▁ xx e os</td>
</tr>
<tr>
<td>▁ xx b os ▁@ de pre c ated ▁protected ▁final ▁map < db id , ▁k n n list > ▁batch n n ( n ▁node , ▁db id s ▁ids , ▁int ▁k max ) ▁{ ▁map < db id , ▁k n n list > ▁res ▁= ▁new ▁hash map <>( id s . size ()); ▁for ( db id iter ▁iter ▁= ▁ids . iter ();</td>
<td>▁ xx b os ▁ xx ma j ▁perform s ▁a ▁batch ▁k - ne a rest ▁neighbor ▁query ▁for ▁a ▁list ▁of ▁query ▁objects . ▁ xx e os</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<details class="description" open="">
<summary class="btn btn-sm" data-open="Hide Code" data-close="Show Code"></summary>
<p><div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="k">def</span> <span class="nf">shift_tfm</span><span class="p">(</span><span class="n">b</span><span class="p">):</span>
<span class="n">x</span><span class="p">,</span><span class="n">y</span> <span class="o">=</span> <span class="n">b</span>
<span class="n">y</span> <span class="o">=</span> <span class="n">F</span><span class="o">.</span><span class="n">pad</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">),</span> <span class="n">value</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="k">return</span> <span class="p">[</span><span class="n">x</span><span class="p">,</span><span class="n">y</span><span class="p">[:,:</span><span class="o">-</span><span class="mi">1</span><span class="p">]],</span> <span class="n">y</span><span class="p">[:,</span><span class="mi">1</span><span class="p">:]</span>
<span class="c1"># Add the necessary shift transformation for training your Transformer model</span>
<span class="n">db_trn</span><span class="o">.</span><span class="n">add_tfm</span><span class="p">(</span><span class="n">shift_tfm</span><span class="p">)</span>
<span class="n">db_tst</span><span class="o">.</span><span class="n">add_tfm</span><span class="p">(</span><span class="n">shift_tfm</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
</p>
</details>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h1 id="Defining-your-model">Defining your model<a class="anchor-link" href="#Defining-your-model"> </a></h1><p>In this example, you will be using the Transformer architecture that was developed by <a href="https://arxiv.org/abs/1706.03762">Vaswani et. al.</a>. If you want a better understanding of this model, I highly suggest <a href="http://nlp.seas.harvard.edu/2018/04/03/attention.html#applications-of-attention-in-our-model">The Annotated Transformer</a> blog post and the <a href="https://www.youtube.com/playlist?list=PLtmWHNX-gukKocXQOkQjuVxglSDYWsSh9">NLP course</a> by Rachel Thomas, which this model code is copied from.</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<details class="description">
<summary class="btn btn-sm" data-open="Hide Code" data-close="Show Code"></summary>
<p><div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="k">class</span> <span class="nc">PositionalEncoding</span><span class="p">(</span><span class="n">nn</span><span class="o">.</span><span class="n">Module</span><span class="p">):</span>
<span class="s2">"Encode the position with a sinusoid."</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">d</span><span class="p">):</span>
<span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">register_buffer</span><span class="p">(</span><span class="s1">'freq'</span><span class="p">,</span> <span class="mi">1</span> <span class="o">/</span> <span class="p">(</span><span class="mi">10000</span> <span class="o">**</span> <span class="p">(</span><span class="n">torch</span><span class="o">.</span><span class="n">arange</span><span class="p">(</span><span class="mf">0.</span><span class="p">,</span> <span class="n">d</span><span class="p">,</span> <span class="mf">2.</span><span class="p">)</span><span class="o">/</span><span class="n">d</span><span class="p">)))</span>
<span class="k">def</span> <span class="nf">forward</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">pos</span><span class="p">):</span>
<span class="n">inp</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">ger</span><span class="p">(</span><span class="n">pos</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">freq</span><span class="p">)</span>
<span class="n">enc</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">cat</span><span class="p">([</span><span class="n">inp</span><span class="o">.</span><span class="n">sin</span><span class="p">(),</span> <span class="n">inp</span><span class="o">.</span><span class="n">cos</span><span class="p">()],</span> <span class="n">dim</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span>
<span class="k">return</span> <span class="n">enc</span>
<span class="k">class</span> <span class="nc">TransformerEmbedding</span><span class="p">(</span><span class="n">nn</span><span class="o">.</span><span class="n">Module</span><span class="p">):</span>
<span class="s2">"Embedding + positional encoding + dropout"</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">vocab_sz</span><span class="p">,</span> <span class="n">emb_sz</span><span class="p">,</span> <span class="n">inp_p</span><span class="o">=</span><span class="mf">0.</span><span class="p">):</span>
<span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">emb_sz</span> <span class="o">=</span> <span class="n">emb_sz</span>
<span class="bp">self</span><span class="o">.</span><span class="n">embed</span> <span class="o">=</span> <span class="n">embedding</span><span class="p">(</span><span class="n">vocab_sz</span><span class="p">,</span> <span class="n">emb_sz</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">pos_enc</span> <span class="o">=</span> <span class="n">PositionalEncoding</span><span class="p">(</span><span class="n">emb_sz</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">drop</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">Dropout</span><span class="p">(</span><span class="n">inp_p</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">forward</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">inp</span><span class="p">):</span>
<span class="n">pos</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">arange</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">inp</span><span class="o">.</span><span class="n">size</span><span class="p">(</span><span class="mi">1</span><span class="p">),</span> <span class="n">device</span><span class="o">=</span><span class="n">inp</span><span class="o">.</span><span class="n">device</span><span class="p">)</span><span class="o">.</span><span class="n">float</span><span class="p">()</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">drop</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">embed</span><span class="p">(</span><span class="n">inp</span><span class="p">)</span> <span class="o">*</span> <span class="n">math</span><span class="o">.</span><span class="n">sqrt</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">emb_sz</span><span class="p">)</span> <span class="o">+</span> <span class="bp">self</span><span class="o">.</span><span class="n">pos_enc</span><span class="p">(</span><span class="n">pos</span><span class="p">))</span>
<span class="k">def</span> <span class="nf">feed_forward</span><span class="p">(</span><span class="n">d_model</span><span class="p">,</span> <span class="n">d_ff</span><span class="p">,</span> <span class="n">ff_p</span><span class="o">=</span><span class="mf">0.</span><span class="p">,</span> <span class="n">double_drop</span><span class="o">=</span><span class="kc">True</span><span class="p">):</span>
<span class="n">layers</span> <span class="o">=</span> <span class="p">[</span><span class="n">nn</span><span class="o">.</span><span class="n">Linear</span><span class="p">(</span><span class="n">d_model</span><span class="p">,</span> <span class="n">d_ff</span><span class="p">),</span> <span class="n">nn</span><span class="o">.</span><span class="n">ReLU</span><span class="p">()]</span>
<span class="k">if</span> <span class="n">double_drop</span><span class="p">:</span> <span class="n">layers</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">nn</span><span class="o">.</span><span class="n">Dropout</span><span class="p">(</span><span class="n">ff_p</span><span class="p">))</span>
<span class="k">return</span> <span class="n">SequentialEx</span><span class="p">(</span><span class="o">*</span><span class="n">layers</span><span class="p">,</span> <span class="n">nn</span><span class="o">.</span><span class="n">Linear</span><span class="p">(</span><span class="n">d_ff</span><span class="p">,</span> <span class="n">d_model</span><span class="p">),</span> <span class="n">nn</span><span class="o">.</span><span class="n">Dropout</span><span class="p">(</span><span class="n">ff_p</span><span class="p">),</span> <span class="n">MergeLayer</span><span class="p">(),</span> <span class="n">nn</span><span class="o">.</span><span class="n">LayerNorm</span><span class="p">(</span><span class="n">d_model</span><span class="p">))</span>
<span class="k">class</span> <span class="nc">MultiHeadAttention</span><span class="p">(</span><span class="n">nn</span><span class="o">.</span><span class="n">Module</span><span class="p">):</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">n_heads</span><span class="p">,</span> <span class="n">d_model</span><span class="p">,</span> <span class="n">d_head</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">p</span><span class="o">=</span><span class="mf">0.</span><span class="p">,</span> <span class="n">bias</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">scale</span><span class="o">=</span><span class="kc">True</span><span class="p">):</span>
<span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">()</span>
<span class="n">d_head</span> <span class="o">=</span> <span class="n">ifnone</span><span class="p">(</span><span class="n">d_head</span><span class="p">,</span> <span class="n">d_model</span><span class="o">//</span><span class="n">n_heads</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">n_heads</span><span class="p">,</span><span class="bp">self</span><span class="o">.</span><span class="n">d_head</span><span class="p">,</span><span class="bp">self</span><span class="o">.</span><span class="n">scale</span> <span class="o">=</span> <span class="n">n_heads</span><span class="p">,</span><span class="n">d_head</span><span class="p">,</span><span class="n">scale</span>
<span class="bp">self</span><span class="o">.</span><span class="n">q_wgt</span><span class="p">,</span><span class="bp">self</span><span class="o">.</span><span class="n">k_wgt</span><span class="p">,</span><span class="bp">self</span><span class="o">.</span><span class="n">v_wgt</span> <span class="o">=</span> <span class="p">[</span><span class="n">nn</span><span class="o">.</span><span class="n">Linear</span><span class="p">(</span>
<span class="n">d_model</span><span class="p">,</span> <span class="n">n_heads</span> <span class="o">*</span> <span class="n">d_head</span><span class="p">,</span> <span class="n">bias</span><span class="o">=</span><span class="n">bias</span><span class="p">)</span> <span class="k">for</span> <span class="n">o</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">3</span><span class="p">)]</span>
<span class="bp">self</span><span class="o">.</span><span class="n">out</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">Linear</span><span class="p">(</span><span class="n">n_heads</span> <span class="o">*</span> <span class="n">d_head</span><span class="p">,</span> <span class="n">d_model</span><span class="p">,</span> <span class="n">bias</span><span class="o">=</span><span class="n">bias</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">drop_att</span><span class="p">,</span><span class="bp">self</span><span class="o">.</span><span class="n">drop_res</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">Dropout</span><span class="p">(</span><span class="n">p</span><span class="p">),</span><span class="n">nn</span><span class="o">.</span><span class="n">Dropout</span><span class="p">(</span><span class="n">p</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">ln</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">LayerNorm</span><span class="p">(</span><span class="n">d_model</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">forward</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">q</span><span class="p">,</span> <span class="n">kv</span><span class="p">,</span> <span class="n">mask</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">ln</span><span class="p">(</span><span class="n">q</span> <span class="o">+</span> <span class="bp">self</span><span class="o">.</span><span class="n">drop_res</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">out</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_apply_attention</span><span class="p">(</span><span class="n">q</span><span class="p">,</span> <span class="n">kv</span><span class="p">,</span> <span class="n">mask</span><span class="o">=</span><span class="n">mask</span><span class="p">))))</span>
<span class="k">def</span> <span class="nf">create_attn_mat</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">x</span><span class="p">,</span> <span class="n">layer</span><span class="p">,</span> <span class="n">bs</span><span class="p">):</span>
<span class="k">return</span> <span class="n">layer</span><span class="p">(</span><span class="n">x</span><span class="p">)</span><span class="o">.</span><span class="n">view</span><span class="p">(</span><span class="n">bs</span><span class="p">,</span> <span class="n">x</span><span class="o">.</span><span class="n">size</span><span class="p">(</span><span class="mi">1</span><span class="p">),</span> <span class="bp">self</span><span class="o">.</span><span class="n">n_heads</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">d_head</span>
<span class="p">)</span><span class="o">.</span><span class="n">permute</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">_apply_attention</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">q</span><span class="p">,</span> <span class="n">kv</span><span class="p">,</span> <span class="n">mask</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="n">bs</span><span class="p">,</span><span class="n">seq_len</span> <span class="o">=</span> <span class="n">q</span><span class="o">.</span><span class="n">size</span><span class="p">(</span><span class="mi">0</span><span class="p">),</span><span class="n">q</span><span class="o">.</span><span class="n">size</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="n">wq</span><span class="p">,</span><span class="n">wk</span><span class="p">,</span><span class="n">wv</span> <span class="o">=</span> <span class="nb">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">o</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">create_attn_mat</span><span class="p">(</span><span class="o">*</span><span class="n">o</span><span class="p">,</span><span class="n">bs</span><span class="p">),</span>
<span class="nb">zip</span><span class="p">((</span><span class="n">q</span><span class="p">,</span><span class="n">kv</span><span class="p">,</span><span class="n">kv</span><span class="p">),(</span><span class="bp">self</span><span class="o">.</span><span class="n">q_wgt</span><span class="p">,</span><span class="bp">self</span><span class="o">.</span><span class="n">k_wgt</span><span class="p">,</span><span class="bp">self</span><span class="o">.</span><span class="n">v_wgt</span><span class="p">)))</span>
<span class="n">attn_score</span> <span class="o">=</span> <span class="n">wq</span> <span class="o">@</span> <span class="n">wk</span><span class="o">.</span><span class="n">transpose</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">)</span>
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">scale</span><span class="p">:</span> <span class="n">attn_score</span> <span class="o">/=</span> <span class="n">math</span><span class="o">.</span><span class="n">sqrt</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">d_head</span><span class="p">)</span>
<span class="k">if</span> <span class="n">mask</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">:</span>
<span class="n">attn_score</span> <span class="o">=</span> <span class="n">attn_score</span><span class="o">.</span><span class="n">float</span><span class="p">()</span><span class="o">.</span><span class="n">masked_fill</span><span class="p">(</span><span class="n">mask</span><span class="p">,</span> <span class="o">-</span><span class="nb">float</span><span class="p">(</span><span class="s1">'inf'</span><span class="p">))</span><span class="o">.</span><span class="n">type_as</span><span class="p">(</span><span class="n">attn_score</span><span class="p">)</span>
<span class="n">attn_prob</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">drop_att</span><span class="p">(</span><span class="n">F</span><span class="o">.</span><span class="n">softmax</span><span class="p">(</span><span class="n">attn_score</span><span class="p">,</span> <span class="n">dim</span><span class="o">=-</span><span class="mi">1</span><span class="p">))</span>
<span class="n">attn_vec</span> <span class="o">=</span> <span class="n">attn_prob</span> <span class="o">@</span> <span class="n">wv</span>
<span class="k">return</span> <span class="n">attn_vec</span><span class="o">.</span><span class="n">permute</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span><span class="o">.</span><span class="n">contiguous</span><span class="p">()</span><span class="o">.</span><span class="n">view</span><span class="p">(</span><span class="n">bs</span><span class="p">,</span> <span class="n">seq_len</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">get_output_mask</span><span class="p">(</span><span class="n">inp</span><span class="p">,</span> <span class="n">pad_idx</span><span class="o">=</span><span class="mi">1</span><span class="p">):</span>
<span class="k">return</span> <span class="n">torch</span><span class="o">.</span><span class="n">triu</span><span class="p">(</span><span class="n">inp</span><span class="o">.</span><span class="n">new_ones</span><span class="p">(</span><span class="n">inp</span><span class="o">.</span><span class="n">size</span><span class="p">(</span><span class="mi">1</span><span class="p">),</span><span class="n">inp</span><span class="o">.</span><span class="n">size</span><span class="p">(</span><span class="mi">1</span><span class="p">)),</span> <span class="n">diagonal</span><span class="o">=</span><span class="mi">1</span><span class="p">)[</span><span class="kc">None</span><span class="p">,</span><span class="kc">None</span><span class="p">]</span><span class="o">.</span><span class="n">byte</span><span class="p">()</span>
<span class="k">class</span> <span class="nc">EncoderBlock</span><span class="p">(</span><span class="n">nn</span><span class="o">.</span><span class="n">Module</span><span class="p">):</span>
<span class="s2">"Encoder block of a Transformer model."</span>
<span class="c1">#Can't use Sequential directly cause more than one input...</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">n_heads</span><span class="p">,</span> <span class="n">d_model</span><span class="p">,</span> <span class="n">d_head</span><span class="p">,</span> <span class="n">d_inner</span><span class="p">,</span> <span class="n">p</span><span class="o">=</span><span class="mf">0.</span><span class="p">,</span> <span class="n">bias</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">scale</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">double_drop</span><span class="o">=</span><span class="kc">True</span><span class="p">):</span>
<span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">mha</span> <span class="o">=</span> <span class="n">MultiHeadAttention</span><span class="p">(</span><span class="n">n_heads</span><span class="p">,</span> <span class="n">d_model</span><span class="p">,</span> <span class="n">d_head</span><span class="p">,</span> <span class="n">p</span><span class="o">=</span><span class="n">p</span><span class="p">,</span> <span class="n">bias</span><span class="o">=</span><span class="n">bias</span><span class="p">,</span> <span class="n">scale</span><span class="o">=</span><span class="n">scale</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">ff</span> <span class="o">=</span> <span class="n">feed_forward</span><span class="p">(</span><span class="n">d_model</span><span class="p">,</span> <span class="n">d_inner</span><span class="p">,</span> <span class="n">ff_p</span><span class="o">=</span><span class="n">p</span><span class="p">,</span> <span class="n">double_drop</span><span class="o">=</span><span class="n">double_drop</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">forward</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">x</span><span class="p">,</span> <span class="n">mask</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">ff</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">mha</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">x</span><span class="p">,</span> <span class="n">mask</span><span class="o">=</span><span class="n">mask</span><span class="p">))</span>
<span class="k">class</span> <span class="nc">DecoderBlock</span><span class="p">(</span><span class="n">nn</span><span class="o">.</span><span class="n">Module</span><span class="p">):</span>
<span class="s2">"Decoder block of a Transformer model."</span>
<span class="c1">#Can't use Sequential directly cause more than one input...</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">n_heads</span><span class="p">,</span> <span class="n">d_model</span><span class="p">,</span> <span class="n">d_head</span><span class="p">,</span> <span class="n">d_inner</span><span class="p">,</span> <span class="n">p</span><span class="o">=</span><span class="mf">0.</span><span class="p">,</span> <span class="n">bias</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">scale</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">double_drop</span><span class="o">=</span><span class="kc">True</span><span class="p">):</span>
<span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">mha1</span> <span class="o">=</span> <span class="n">MultiHeadAttention</span><span class="p">(</span><span class="n">n_heads</span><span class="p">,</span> <span class="n">d_model</span><span class="p">,</span> <span class="n">d_head</span><span class="p">,</span> <span class="n">p</span><span class="o">=</span><span class="n">p</span><span class="p">,</span> <span class="n">bias</span><span class="o">=</span><span class="n">bias</span><span class="p">,</span> <span class="n">scale</span><span class="o">=</span><span class="n">scale</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">mha2</span> <span class="o">=</span> <span class="n">MultiHeadAttention</span><span class="p">(</span><span class="n">n_heads</span><span class="p">,</span> <span class="n">d_model</span><span class="p">,</span> <span class="n">d_head</span><span class="p">,</span> <span class="n">p</span><span class="o">=</span><span class="n">p</span><span class="p">,</span> <span class="n">bias</span><span class="o">=</span><span class="n">bias</span><span class="p">,</span> <span class="n">scale</span><span class="o">=</span><span class="n">scale</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">ff</span> <span class="o">=</span> <span class="n">feed_forward</span><span class="p">(</span><span class="n">d_model</span><span class="p">,</span> <span class="n">d_inner</span><span class="p">,</span> <span class="n">ff_p</span><span class="o">=</span><span class="n">p</span><span class="p">,</span> <span class="n">double_drop</span><span class="o">=</span><span class="n">double_drop</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">forward</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">x</span><span class="p">,</span> <span class="n">enc</span><span class="p">,</span> <span class="n">mask_out</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">ff</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">mha2</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">mha1</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">x</span><span class="p">,</span> <span class="n">mask_out</span><span class="p">),</span> <span class="n">enc</span><span class="p">))</span>
</pre></div>
</div>
</div>
</div>
</p>
</details>
</div>
<div class="cell border-box-sizing code_cell rendered">
<details class="description" open="">
<summary class="btn btn-sm" data-open="Hide Code" data-close="Show Code"></summary>
<p><div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="k">class</span> <span class="nc">Transformer</span><span class="p">(</span><span class="n">Module</span><span class="p">):</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">inp_vsz</span><span class="p">,</span> <span class="n">out_vsz</span><span class="p">,</span> <span class="n">n_layers</span><span class="o">=</span><span class="mi">6</span><span class="p">,</span> <span class="n">n_heads</span><span class="o">=</span><span class="mi">8</span><span class="p">,</span> <span class="n">d_model</span><span class="o">=</span><span class="mi">256</span><span class="p">,</span> <span class="n">d_head</span><span class="o">=</span><span class="mi">32</span><span class="p">,</span>
<span class="n">d_inner</span><span class="o">=</span><span class="mi">1024</span><span class="p">,</span> <span class="n">p</span><span class="o">=</span><span class="mf">0.1</span><span class="p">,</span> <span class="n">bias</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">scale</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">double_drop</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">pad_idx</span><span class="o">=</span><span class="mi">1</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">enc_emb</span> <span class="o">=</span> <span class="n">TransformerEmbedding</span><span class="p">(</span><span class="n">inp_vsz</span><span class="p">,</span> <span class="n">d_model</span><span class="p">,</span> <span class="n">p</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">dec_emb</span> <span class="o">=</span> <span class="n">TransformerEmbedding</span><span class="p">(</span><span class="n">out_vsz</span><span class="p">,</span> <span class="n">d_model</span><span class="p">,</span> <span class="mf">0.</span><span class="p">)</span>
<span class="n">args</span> <span class="o">=</span> <span class="p">(</span><span class="n">n_heads</span><span class="p">,</span> <span class="n">d_model</span><span class="p">,</span> <span class="n">d_head</span><span class="p">,</span> <span class="n">d_inner</span><span class="p">,</span> <span class="n">p</span><span class="p">,</span> <span class="n">bias</span><span class="p">,</span> <span class="n">scale</span><span class="p">,</span> <span class="n">double_drop</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">encoder</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">ModuleList</span><span class="p">([</span><span class="n">EncoderBlock</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">)</span> <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">n_layers</span><span class="p">)])</span>
<span class="bp">self</span><span class="o">.</span><span class="n">decoder</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">ModuleList</span><span class="p">([</span><span class="n">DecoderBlock</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">)</span> <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">n_layers</span><span class="p">)])</span>
<span class="bp">self</span><span class="o">.</span><span class="n">out</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">Linear</span><span class="p">(</span><span class="n">d_model</span><span class="p">,</span> <span class="n">out_vsz</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">out</span><span class="o">.</span><span class="n">weight</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">dec_emb</span><span class="o">.</span><span class="n">embed</span><span class="o">.</span><span class="n">weight</span>
<span class="bp">self</span><span class="o">.</span><span class="n">pad_idx</span> <span class="o">=</span> <span class="n">pad_idx</span>
<span class="k">def</span> <span class="nf">forward</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">inp</span><span class="p">,</span> <span class="n">out</span><span class="p">):</span>
<span class="n">mask_out</span> <span class="o">=</span> <span class="n">get_output_mask</span><span class="p">(</span><span class="n">out</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">pad_idx</span><span class="p">)</span>
<span class="n">enc</span><span class="p">,</span><span class="n">out</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">enc_emb</span><span class="p">(</span><span class="n">inp</span><span class="p">),</span><span class="bp">self</span><span class="o">.</span><span class="n">dec_emb</span><span class="p">(</span><span class="n">out</span><span class="p">)</span>
<span class="n">enc</span> <span class="o">=</span> <span class="n">compose</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">encoder</span><span class="p">)(</span><span class="n">enc</span><span class="p">)</span>
<span class="n">out</span> <span class="o">=</span> <span class="n">compose</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">decoder</span><span class="p">)(</span><span class="n">out</span><span class="p">,</span> <span class="n">enc</span><span class="p">,</span> <span class="n">mask_out</span><span class="p">)</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">out</span><span class="p">(</span><span class="n">out</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
</p>
</details>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>To evaluate your model you will be using the commonly used BLEU score, which is a measure for determining how closely your model's generated comment is to the real comment of a method. (This code is also copied from the NLP tutorial from Rachel Thomas)</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<details class="description">
<summary class="btn btn-sm" data-open="Hide Code" data-close="Show Code"></summary>
<p><div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="k">class</span> <span class="nc">NGram</span><span class="p">():</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">ngram</span><span class="p">,</span> <span class="n">max_n</span><span class="o">=</span><span class="mi">5000</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">ngram</span><span class="p">,</span><span class="bp">self</span><span class="o">.</span><span class="n">max_n</span> <span class="o">=</span> <span class="n">ngram</span><span class="p">,</span><span class="n">max_n</span>
<span class="k">def</span> <span class="fm">__eq__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">other</span><span class="p">):</span>
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">ngram</span><span class="p">)</span> <span class="o">!=</span> <span class="nb">len</span><span class="p">(</span><span class="n">other</span><span class="o">.</span><span class="n">ngram</span><span class="p">):</span> <span class="k">return</span> <span class="kc">False</span>
<span class="k">return</span> <span class="n">np</span><span class="o">.</span><span class="n">all</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">ngram</span><span class="p">)</span> <span class="o">==</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">(</span><span class="n">other</span><span class="o">.</span><span class="n">ngram</span><span class="p">))</span>
<span class="k">def</span> <span class="fm">__hash__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="k">return</span> <span class="nb">int</span><span class="p">(</span><span class="nb">sum</span><span class="p">([</span><span class="n">o</span> <span class="o">*</span> <span class="bp">self</span><span class="o">.</span><span class="n">max_n</span><span class="o">**</span><span class="n">i</span> <span class="k">for</span> <span class="n">i</span><span class="p">,</span><span class="n">o</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">ngram</span><span class="p">)]))</span>
<span class="k">def</span> <span class="nf">get_grams</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">n</span><span class="p">,</span> <span class="n">max_n</span><span class="o">=</span><span class="mi">5000</span><span class="p">):</span>
<span class="k">return</span> <span class="n">x</span> <span class="k">if</span> <span class="n">n</span><span class="o">==</span><span class="mi">1</span> <span class="k">else</span> <span class="p">[</span><span class="n">NGram</span><span class="p">(</span><span class="n">x</span><span class="p">[</span><span class="n">i</span><span class="p">:</span><span class="n">i</span><span class="o">+</span><span class="n">n</span><span class="p">],</span> <span class="n">max_n</span><span class="o">=</span><span class="n">max_n</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">x</span><span class="p">)</span><span class="o">-</span><span class="n">n</span><span class="o">+</span><span class="mi">1</span><span class="p">)]</span>
<span class="k">def</span> <span class="nf">get_correct_ngrams</span><span class="p">(</span><span class="n">pred</span><span class="p">,</span> <span class="n">targ</span><span class="p">,</span> <span class="n">n</span><span class="p">,</span> <span class="n">max_n</span><span class="o">=</span><span class="mi">5000</span><span class="p">):</span>
<span class="n">pred_grams</span><span class="p">,</span><span class="n">targ_grams</span> <span class="o">=</span> <span class="n">get_grams</span><span class="p">(</span><span class="n">pred</span><span class="p">,</span> <span class="n">n</span><span class="p">,</span> <span class="n">max_n</span><span class="o">=</span><span class="n">max_n</span><span class="p">),</span><span class="n">get_grams</span><span class="p">(</span><span class="n">targ</span><span class="p">,</span> <span class="n">n</span><span class="p">,</span> <span class="n">max_n</span><span class="o">=</span><span class="n">max_n</span><span class="p">)</span>
<span class="n">pred_cnt</span><span class="p">,</span><span class="n">targ_cnt</span> <span class="o">=</span> <span class="n">Counter</span><span class="p">(</span><span class="n">pred_grams</span><span class="p">),</span><span class="n">Counter</span><span class="p">(</span><span class="n">targ_grams</span><span class="p">)</span>
<span class="k">return</span> <span class="nb">sum</span><span class="p">([</span><span class="nb">min</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">targ_cnt</span><span class="p">[</span><span class="n">g</span><span class="p">])</span> <span class="k">for</span> <span class="n">g</span><span class="p">,</span><span class="n">c</span> <span class="ow">in</span> <span class="n">pred_cnt</span><span class="o">.</span><span class="n">items</span><span class="p">()]),</span><span class="nb">len</span><span class="p">(</span><span class="n">pred_grams</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">CorpusBLEU</span><span class="p">(</span><span class="n">Callback</span><span class="p">):</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">vocab_sz</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">vocab_sz</span> <span class="o">=</span> <span class="n">vocab_sz</span>
<span class="bp">self</span><span class="o">.</span><span class="n">name</span> <span class="o">=</span> <span class="s1">'bleu'</span>
<span class="k">def</span> <span class="nf">on_epoch_begin</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">pred_len</span><span class="p">,</span><span class="bp">self</span><span class="o">.</span><span class="n">targ_len</span><span class="p">,</span><span class="bp">self</span><span class="o">.</span><span class="n">corrects</span><span class="p">,</span><span class="bp">self</span><span class="o">.</span><span class="n">counts</span> <span class="o">=</span> <span class="mi">0</span><span class="p">,</span><span class="mi">0</span><span class="p">,[</span><span class="mi">0</span><span class="p">]</span><span class="o">*</span><span class="mi">4</span><span class="p">,[</span><span class="mi">0</span><span class="p">]</span><span class="o">*</span><span class="mi">4</span>
<span class="k">def</span> <span class="nf">on_batch_end</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">last_output</span><span class="p">,</span> <span class="n">last_target</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="n">last_output</span> <span class="o">=</span> <span class="n">last_output</span><span class="o">.</span><span class="n">argmax</span><span class="p">(</span><span class="n">dim</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span>
<span class="k">for</span> <span class="n">pred</span><span class="p">,</span><span class="n">targ</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">last_output</span><span class="o">.</span><span class="n">cpu</span><span class="p">()</span><span class="o">.</span><span class="n">numpy</span><span class="p">(),</span><span class="n">last_target</span><span class="o">.</span><span class="n">cpu</span><span class="p">()</span><span class="o">.</span><span class="n">numpy</span><span class="p">()):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">pred_len</span> <span class="o">+=</span> <span class="nb">len</span><span class="p">(</span><span class="n">pred</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">targ_len</span> <span class="o">+=</span> <span class="nb">len</span><span class="p">(</span><span class="n">targ</span><span class="p">)</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">4</span><span class="p">):</span>
<span class="n">c</span><span class="p">,</span><span class="n">t</span> <span class="o">=</span> <span class="n">get_correct_ngrams</span><span class="p">(</span><span class="n">pred</span><span class="p">,</span> <span class="n">targ</span><span class="p">,</span> <span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">,</span> <span class="n">max_n</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">vocab_sz</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">corrects</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">+=</span> <span class="n">c</span>
<span class="bp">self</span><span class="o">.</span><span class="n">counts</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">+=</span> <span class="n">t</span>
<span class="k">def</span> <span class="nf">on_epoch_end</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">last_metrics</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="n">precs</span> <span class="o">=</span> <span class="p">[</span><span class="n">c</span><span class="o">/</span><span class="n">t</span> <span class="k">for</span> <span class="n">c</span><span class="p">,</span><span class="n">t</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">corrects</span><span class="p">,</span><span class="bp">self</span><span class="o">.</span><span class="n">counts</span><span class="p">)]</span>
<span class="n">len_penalty</span> <span class="o">=</span> <span class="n">exp</span><span class="p">(</span><span class="mi">1</span> <span class="o">-</span> <span class="bp">self</span><span class="o">.</span><span class="n">targ_len</span><span class="o">/</span><span class="bp">self</span><span class="o">.</span><span class="n">pred_len</span><span class="p">)</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">pred_len</span> <span class="o"><</span> <span class="bp">self</span><span class="o">.</span><span class="n">targ_len</span> <span class="k">else</span> <span class="mi">1</span>
<span class="n">bleu</span> <span class="o">=</span> <span class="n">len_penalty</span> <span class="o">*</span> <span class="p">((</span><span class="n">precs</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">*</span><span class="n">precs</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="o">*</span><span class="n">precs</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span><span class="o">*</span><span class="n">precs</span><span class="p">[</span><span class="mi">3</span><span class="p">])</span> <span class="o">**</span> <span class="mf">0.25</span><span class="p">)</span>
<span class="k">return</span> <span class="n">add_metrics</span><span class="p">(</span><span class="n">last_metrics</span><span class="p">,</span> <span class="n">bleu</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
</p>
</details>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">n_x_vocab</span><span class="p">,</span> <span class="n">n_y_vocab</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">db_trn</span><span class="o">.</span><span class="n">train_ds</span><span class="o">.</span><span class="n">x</span><span class="o">.</span><span class="n">vocab</span><span class="o">.</span><span class="n">itos</span><span class="p">),</span> <span class="nb">len</span><span class="p">(</span><span class="n">db_trn</span><span class="o">.</span><span class="n">train_ds</span><span class="o">.</span><span class="n">y</span><span class="o">.</span><span class="n">vocab</span><span class="o">.</span><span class="n">itos</span><span class="p">)</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">Transformer</span><span class="p">(</span><span class="n">n_x_vocab</span><span class="p">,</span> <span class="n">n_y_vocab</span><span class="p">,</span> <span class="n">d_model</span><span class="o">=</span><span class="mi">256</span><span class="p">)</span>
<span class="n">learn</span> <span class="o">=</span> <span class="n">Learner</span><span class="p">(</span><span class="n">db_trn</span><span class="p">,</span> <span class="n">model</span><span class="p">,</span> <span class="n">metrics</span><span class="o">=</span><span class="p">[</span><span class="n">accuracy</span><span class="p">,</span> <span class="n">CorpusBLEU</span><span class="p">(</span><span class="n">n_y_vocab</span><span class="p">)],</span> <span class="n">loss_func</span> <span class="o">=</span> <span class="n">CrossEntropyFlat</span><span class="p">())</span>
</pre></div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>Now you are going to use the awesome Learning Rate finder provided by FastAI, which is based on the awesome paper from Leslie N. Smith <a href="https://arxiv.org/abs/1506.01186">"Cyclical Learning Rates for Training Neural Networks"</a>. This way you don't have to do a bunch of hyperparameter searching to find the perfect fit.</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">learn</span><span class="o">.</span><span class="n">lr_find</span><span class="p">()</span>
<span class="n">learn</span><span class="o">.</span><span class="n">recorder</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">suggestion</span> <span class="o">=</span> <span class="kc">True</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_html rendered_html output_subarea ">
<div>
<style>
/* Turns off some styling */
progress {
/* gets rid of default border in Firefox and Opera. */
border: none;
/* Needs to be in here for Safari polyfill so background images work as expected. */
background-size: auto;
}
.progress-bar-interrupted, .progress-bar-interrupted::-webkit-progress-bar {
background: #F44336;
}
</style>
<progress value='0' class='' max='1', style='width:300px; height:20px; vertical-align: middle;'></progress>
0.00% [0/1 00:00<00:00]
</div>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: left;">
<th>epoch</th>
<th>train_loss</th>
<th>valid_loss</th>
<th>accuracy</th>
<th>bleu</th>
<th>time</th>
</tr>
</thead>
<tbody>
</tbody>
</table><p>
<div>
<style>
/* Turns off some styling */
progress {
/* gets rid of default border in Firefox and Opera. */
border: none;
/* Needs to be in here for Safari polyfill so background images work as expected. */
background-size: auto;
}
.progress-bar-interrupted, .progress-bar-interrupted::-webkit-progress-bar {
background: #F44336;
}
</style>
<progress value='92' class='' max='423', style='width:300px; height:20px; vertical-align: middle;'></progress>
21.75% [92/423 01:21<04:53 19.5654]
</div>
</div>
</div>
<div class="output_area">
<div class="output_subarea output_stream output_stdout output_text">
<pre>LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.
Min numerical gradient: 3.02E-03
Min loss divided by 10: 1.74E-02
</pre>
</div>
</div>
<div class="output_area">
<div class="output_png output_subarea ">
<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAYUAAAEGCAYAAACKB4k+AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz
AAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0
dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAgAElEQVR4nO3deXxU1fn48c8z2XdCEkIgyA4CsggR
BRVExbpVaqUqdQFti1ttq239+v22P9tqrXaxWqUuuNC6VNsiWrVudRcVaNg3RXYStiyQfc/z+2Nu
YoyBhDB37kzyvF+veWXm3jv3Pocheeacc885oqoYY4wxAD6vAzDGGBM6LCkYY4xpZknBGGNMM0sK
xhhjmllSMMYY0yzS6wCOVHp6ug4YMMDrMIwxJqwsX768UFUz2jsu7JLCgAEDyM3N9ToMY4wJKyKy
oyPHWfORMcaYZpYUjDHGNLOkYIwxppklBWOMMc0sKRhjjGlmScEYY0wzSwrGGGOahd04hc7atK+M
f6/ZA4ACqIIIsVE+YiMjiI2KIC7aR1xUJHHREcRHRxAbGYGiNM0uHuET4qMjSIyNJDEmkrioCETE
szIZY0ygdauk8Ke3Pw/oOUUgITqShJgI52ckSU7CSIqNIik2kpS4KFLiokiOiyI5NpLE2EiSYqJI
jI0k2dkfGWEVNmNMaOg2SeG80Vmcd1fWl77Zqyo19Y1U1zVQXddIZW09VXUNVNU2UFnbQE19I4L/
j78I1DcoFbX1lNc0UFFT7zyc57X1lNfUU15dz86KSsqq6ymtqqOspr7d2JJiI0mNjyYjKYbeKbH0
To4lKyWWrJQ4+qbG0bdHHOmJ0VYrMca4rtskhbb+oIoIsVH+piO3NDQqZdV1lFTVUVb9ReIoq6mj
tKqeA5W1HKys40BlLQVlNWzcXcrbG/dRXdf4pfPERPrITo2jX894jukZT7/UeAamJzA0M5Hs1Hgi
fJYwjDFHr9skBa9E+IQe8dH0iI/u8HtUlZKqOnYfrCb/YBX5ByrJP1jFruIqdh2oZPmOA5RVf1ED
iY70MSg9gRFZyYzqk8zIrGRG9UkhJT7KjSIZY7owSwohSOSLRDKyT3KbxxysrGVLQQVb9pezuaCc
TfvK+GRLES+szG8+Jjs1jtF9UzjOeYzNTjmi5GSM6X4sKYSpHvHRTOgfzYT+qV/aXlhew4bdpazf
Xcq63SWsyy/htXV7m/cPTE9gXL8eHH9MD04alMbQXonWV2GMaWZJoYtJT4xhyrAMpgz7Ytr0kqo6
1ueXsCrvICt3HmTx5sLmGkWvpBhOGZLOyUPSOf3YXqQmWE3CmO7MkkI3kBIXxeQh6Uwekg74+yzy
DlTx8ZZCFm8u4r1NBSxamU+ETzhpUE/OPi6Lr43KpFdSrMeRG2OCTbRpZJYbJxe5Cfgu/vFia4Gr
VLW6xf45wO+Bpobwear62OHOmZOTo7bITmA1Nirrd5fy+vo9vLZ2L1sLKxCBcf16MH1kJtNHZDLE
mpmMCWsislxVc9o9zq2kICJ9gcXASFWtEpF/AK+q6l9aHDMHyFHV73f0vJYU3KWqbNpXzuvr9vLW
xn2szS8BYEBaPN84vi8Xjc+mX894j6M0xhypjiYFt5uPIoE4EakD4oHdLl/PHCURYXjvJIb3TuKH
Zw5lT0kVb2/cz6tr9/Cntz/nvrc+Z9KgNGZOyObc0VnERbs3xsMYE3xuNx/9ELgTqALeVNXLWu2f
A9wFFACbgJtUdVcb55kLzAU45phjJuzY0aGlRk2A5R2o5IUV+SxckceOokqSYyP55vhsZk08huG9
k7wOzxhzGKHQfJQKPA9cAhwE/gksVNWnWxyTBpSrao2IXANcoqqnH+681nzkPVVlydZinvvvTl5b
u5fahkbGH9OD2ZMHcO7oLKJsLidjQk4oJIVvAWer6nec11cCJ6nq9Yc4PgIoVtWUw53XkkJoKa6o
ZdGKPJ5ZupNthRVkJsdwxUn9mTXxGNISY7wOzxjj6GhScPMr3U7gJBGJF/9tK2cAG1seICJZLV5e
0Hq/CX09E6L57qmDePvmqTwxJ4dhmUn84c1NTLr7HW58diWLPy+ksdG9JkpjTGC51tGsqktFZCGw
AqgHVgLzReR2IFdVXwJ+ICIXOPuLgTluxWPc5fMJpx+byenHZvL5vjKeWbqTF1bm8/Lq3WSnxnFJ
Tj8umdjPxj4YE+Jc7Wh2gzUfhY/qugbeWL+Xf+Tu4qPNRURFCOeOzuLKSQMYf0wPG/dgTBB53qfg
FksK4WlrQTlPLdnBwtw8ymrqGd03he+eOtA6po0JEksKJiSV19Tzwoo8Fny0na2FFWSlxDJn8gBm
nXgMybE21bcxbrGkYEJaY6Py7mf7efTDrSzZWkxslI+zR/XmwvHZnDIk3RYNMibAQmVEszFt8vmE
M0ZkcsaITNbll/Dssp28vHo3L67aTa+kGC6akM3VJw8kI8luazUmmKymYEJGTX0D72zcz/Mr8nnn
031ER/r49sT+XDN1EJnJdteSMUfDmo9MWNtaUM6f393Ci6v8U3pfktOPa6YOIjvVJuMzpjMsKZgu
YWdRJQ+9v5mFy/NQhQuP78t1pw1mUEai16EZE1YsKZguZffBKuZ/sJVnl+2krqGRc0dnccO0IYzI
ansNa2PMl1lSMF1SQVkNjy3eytOf7KCitoEzR/Ti+mlDGH9MavtvNqYbs6RgurSSyjr+8vF2Fny8
jYOVdUwenMYN04YweXCajZQ2pg2WFEy3UFFTz9+W7uTRD7eyv6yGsf16cMNpgzlzRCY+G+tgTLNQ
mCXVGNclxETyvSmD+OCWadx54XEUV9Qw96nlfH3eYj7ZUuR1eMYEzMebC9lVXOn6dSwpmC4hNiqC
y07sz7s/Po0/XjyWg5V1zHp0Cdc8lcv2wgqvwzPmqKgqsxcs4+ml7q86aUnBdCmRET6+OT6bt388
lZ9+bTgffl7I9Hvf51cvr2d/WbXX4RnTKeU19dQ1KGkJ0a5fy5KC6ZJioyK4YdoQ3vvJaVw0Ppsn
P9nBlN+9y53/3kBheY3X4RlzRIoragHomeD+tC+WFEyX1is5lrsvGsPbN0/l3NFZPL54G6f+9l3+
8MZnVNTUex2eMR1S1JwU3J9J2JKC6RYGpCfwx4vH8dbNU5k+MpN5727m9Hve44WVeYTbHXim+zlg
NQVj3DEoI5H7Zx3P89dNIjM5lpv+vpqLHvqYNXkHvQ7NmENqqilYn4IxLpnQvycvXn8yv5s5hp3F
Vcz480f8/MW1lFTWeR2aMV/RVFNItaRgjHt8PuHinH6885OpzJ40gL8t3cm0e97jH7m7aGy0JiUT
OooraomO9JEQHeH6tVxNCiJyk4isF5F1IvKsiMS22h8jIn8Xkc0islREBrgZjzFtSY6N4pcXjOLl
G09hQFo8tyxcw6xHl7DNxjeYEFFUUUtaQnRQpnBxLSmISF/gB0COqh4HRACXtjrsO8ABVR0C3Av8
1q14jGnPqD4pLLx2Mnd/czQb9pRy9n0fMP+DLdQ3NHodmunmDlTUkhrvftMRuN98FAnEiUgkEA/s
brV/BvBX5/lC4Ayx2cyMh3w+4dKJx/DWzVM5dWgGv3n1Uy566GM+21vmdWimGyuqqCUtMcyTgqrm
A38AdgJ7gBJVfbPVYX2BXc7x9UAJkNb6XCIyV0RyRSS3oKDArZCNaZaZHMujV07ggVnHs+tAFV9/
YDF/fnez1RqMJ4q7Qk1BRFLx1wQGAn2ABBG5vDPnUtX5qpqjqjkZGRmBDNOYQxIRvj62D/+5aQrT
R2by+zc+46KHPubzfVZrMMF1oKKWnkG48wjcbT46E9imqgWqWgcsAia3OiYf6AfgNDGlADa1pQkp
aYkx/Pmy8cz79vHsLK7kvAcW89Qn223QmwmKmvoGymrqgzJGAdxNCjuBk0Qk3uknOAPY2OqYl4DZ
zvOZwDtqv2kmRJ0/pg//uXkqpwxJ5//9az0//udqqusavA7LdHEHKvxjZ4IxRgHc7VNYir/zeAWw
1rnWfBG5XUQucA57HEgTkc3AzcCtbsVjTCCkJ8bw2JU53HTmMF5Ymc9FD30clDnuTfdVHMTRzOC/
O8g1qvoL4BetNt/WYn818C03YzAm0Hw+4YdnDmV0djI/fG4VX5+3mHmzxnPK0HSvQzNd0BczpIZ5
TcGYru70YzN5+funkJkUy5VPLOXxxdusn8EEXHGlJQVjwsaA9ASev34yZ47I5I5XNvDThWuoqbd+
BhM4xc76H5YUjAkTiTGRPHz5BH5wxlAWLs/j0vlLbCEfEzDFFbWIQI9wH6dgTHfi8wk3Tx/GQ5eN
Z+OeUi5/bGlzW7AxR6O4spYecVFE+IIz2YMlBWMC6JzRWTw++wS2FVbw7UeXNE95bExnFQdx4BpY
UjAm4E4eks6jV+awtbCCyx5bysFKSwym84rKa0kLwoprTSwpGOOCKcMymH/FBDbvL+eKx5dRWm2L
95jOOVBZS2oQ1mZuYknBGJecNrwXj1wxgY17SrnhmRU2mZ7pFH/zkdUUjOkSph3bizsvPI4PPy/k
Vy9vsHEM5og0NioHKuvoGcSagqsjmo0xcMkJx7C1oIJHPtjK4IwE5pw80OuQTJgora6joVGDWlOw
pGBMENxy9rFsLazg9lc20D8tgWnH9vI6JBMGioI87xFY85ExQRHhE+67ZBzH9k7m+39bweb9tiaD
aV/TWJdgzZAKlhSMCZqEmEgen5NDbFQE1z+zgqpamw7DHF6wZ0gFSwrGBFVWShz3XjKOz/eX84uX
1nkdjglxwZ4hFSwpGBN0U4ZlcMNpQ/hHbh7PL8/zOhwTwiwpGNNN/OjMoZw4sCc/f3Gd9S+YQyqu
qCU+OoLYqIigXdOSgjEeiIzwcf+s44mPtv4Fc2jBnvcILCkY45nM5FjuvWQcm/aVc/drrZcvN8aS
gjHdzpRhGVx98kD++skO3t9U4HU4JsR0qaQgIsNFZFWLR6mI/KjVMaeJSEmLY2471PmM6apuOXs4
Q3sl8tN/rrapts2XFFfU0jNIi+s0cS0pqOpnqjpOVccBE4BK4IU2Dv2w6ThVvd2teIwJVbFREdx3
6TgOVNbysxfX2vxIplmXqim0cgawRVV3BOl6xoSVUX1SuHn6cF5du5dFK/K9DseEgKraBqrqGuiZ
2DWTwqXAs4fYN0lEVovIayIyqq0DRGSuiOSKSG5BgbW7mq5p7pRBTBzQk1+8tJ49JVVeh2M8VlTh
X+e7yzQfNRGRaOAC4J9t7F4B9FfVscADwIttnUNV56tqjqrmZGRkuBesMR6K8Al/+NZY6hoaueOV
DV6HYzx2oMK/MFNXbD46B1ihqvta71DVUlUtd56/CkSJSHoQYjImJB2TFs+Npw/h1bV7ee+z/V6H
YzzUVFNI64LNR7M4RNORiPQWEXGeT3TiKQpCTMaErO9NGcSg9AR+8dJ6qutsUFt31TxDaldqPhKR
BGA6sKjFtmtF5Frn5UxgnYisBu4HLlW79cJ0czGREdw+4zh2FFXy8PtbvA7HeOSLGVKDt8AOuLzI
jqpWAGmttj3c4vk8YJ6bMRgTjk4Zms7Xx/bhwfe28I1xfRmQnuB1SCbIiitqifAJyXHBXQvNRjQb
E6J+ft4IoiN83PbSehu70A0dqKwlNT4ap4U9aCwpGBOiMpNjuWn6MD7YVMA7n1qnc3dTVF4b1MV1
mlhSMCaEXTmpPwPTE/jNqxupb2j0OhwTRF6MZgZLCsaEtKgIH7eecyxbCip47r+7vA7HBFFxpSUF
Y0wbzhqZycQBPbnvrU2UVdd5HY4JEqspGGPaJCL87LwRFJbX8sj7W70OxwRBXUMjByvrgj5wDSwp
GBMWxvbrwYxxfXj0w602L1I30DSFelpicMcogCUFY8LGT84ajgJ/eGOT16EYlxWW+5NCujUfGWMO
pV/PeK6aPIBFK/PYtK/M63CMi4qtpmCM6Yhrpw4mLiqCee9s9joU4yKvJsMDSwrGhJXUhGiunDSA
l9fsZvP+cq/DMS5paj6ywWvGmHZ979SBDC3dx97Lr4bkZPD5/D+vvx622AR6XUFReQ2RPiE5Niro
17akYEyYSfvwHV55/AYmvr0IyspA1f/zscdgzBh47TWvQzRHqajcP0bB5wvuvEdgScGY8LJlC8yc
SXRNNdGNrdZaqKuDykqYOdNqDGGuqKLGk05msKRgTHi55x7/H//DqauDe+8NTjzGFYXltaR70MkM
lhSMCS9PP92xpPDUU8GJx7iiqKLGk05msKRgTHgp7+AdRx09zoSkovLa0G4+EpHBIhLjPD9NRH4g
Ij3cDc0Y8xWJiYE9zoScytp6KmsbPBmjAB2vKTwPNIjIEGA+0A/4m2tRGWPadvnlENXObYpRUXDF
FcGJxwRcUfMUFyFcUwAaVbUeuBB4QFV/CmS5F5Yxpk0//nHHksJNNwUnHhNwRc1TXIR2TaFORGYB
s4FXnG2H/Z8pIsNFZFWLR6mI/KjVMSIi94vIZhFZIyLjj7wIxnQjgwfDwoUQH/+V5FDri6AxLt6/
f/BgjwI0R6u4eYqL0K4pXAVMAu5U1W0iMhA47O0NqvqZqo5T1XHABKASeKHVYecAQ53HXOChIwne
mG7pnHNgzRqYO7d5RHNjUjL/OP4c7r777/79Jmx5OcUFdDApqOoGVf2Bqj4rIqlAkqr+9giucwaw
RVV3tNo+A3hS/ZYAPUTEmqWMac/gwTBvHpSUQEMDvtIStv3ytzy+18f2wgqvozNHoalPIaSbj0Tk
PRFJFpGewArgURH54xFc51Lg2Ta29wVaLjyb52xrff25IpIrIrkFBQVHcFljuo9rpg4iKkK4/53P
vQ7FHIWi8hrioiKIj4705PodbT5KUdVS4Jv4v9mfCJzZkTeKSDRwAfDPzoUIqjpfVXNUNScjI6Oz
pzGmS+uVFMvlJ/bnxZX5bLPaQtgqqqj1rJYAHU8KkU6zzsV80dHcUecAK1R1Xxv78vHf3tok29lm
jOmEa6YOJjrSxwNWWwhbheXezXsEHU8KtwNv4O8X+K+IDAI6+r9uFm03HQG8BFzp3IV0ElCiqns6
eF5jTCsZSTFccZK/trC1wEY1h6Oi8lpPluFs0tGO5n+q6hhVvc55vVVVL2rvfSKSAEwHFrXYdq2I
XOu8fBXYCmwGHgWuP8L4jTGtzJ3iry3Y6mzhyT9DaognBRHJFpEXRGS/83heRLLbe5+qVqhqmqqW
tNj2sKo+7DxXVb1BVQer6mhVze18UYwx4K8tXDlpAC+ustpCuFFVT+c9go43Hy3A39TTx3m87Gwz
xoSguVMGOX0LVlsIJ6VV9dQ3qmdjFKDjSSFDVReoar3z+AtgtwEZE6LSE/21hX+tymeL1RbCRqEz
mjk9DGoKRSJyuYhEOI/LgSI3AzPGHJ25UwYRExnBA2/bnUjhwuuBa9DxpHA1/ttR9wJ7gJnAHJdi
MsYEQHpiDLMnD+Bfq3ezLr+k/TcYzzXNe9Qz1JuPVHWHql6gqhmq2ktVvwG0e/eRMcZb108bTGp8
NLe/sgFV9Toc046meY/CofmoLTcHLApjjCuSY6O4afowlm0r5o31e70Ox7SjqfkoNT7EawqHIAGL
whjjmlkn9GNYZiK/efVTauobvA7HHEZRRQ0pcVFER3q3UvLRXNnqosaEgcgIHz8/byQ7iyv5y0fb
vQ7HHIZ/jIJ3tQRoJymISJmzOE7rRxn+8QrGmDAwZVgG04ZnMO+dzRSW13gdjjmEwvIaz5bhbHLY
pKCqSaqa3MYjSVW9mdfVGNMpPztvJJV1Ddzz5iavQzGH4PUMqXB0zUfGmDAypFciV5zUn+f+u9Nu
UQ1RReXeznsElhSM6VZuOnMYqfHR/PKl9XaLaoipb2jkQGUdaaHcfGSM6VpS4qO45WvDyd1xgBdX
2dIloaS4smmMgtUUjDFBdHFOP8Zkp3DXq59SXlPvdTjG8cUUF1ZTMMYEkc8n/OqCUewvq7F5kUJI
U1LwcooLsKRgTLd0/DGpzJyQzRMfbbNZVENEUfMMqZYUjDEe+J+zjyU2MsI6nUNEc/ORdTQbY7yQ
kRTDj6YP48PPC3lj/T6vw+n2iipqiPAJKXFRnsZhScGYbmz2pP4Mz0zijlc2UFVr8yJ5qai8lp4J
0fh83k4r52pSEJEeIrJQRD4VkY0iMqnV/tNEpEREVjmP29yMxxjzZZERPm6fMYr8g1U8+J4t3eml
wvJaT5fhbOL2VBV/Al5X1ZkiEg3Et3HMh6p6vstxGGMO4cRBacwY14dH3t/KN8dnMzA9weuQuqWi
ihpP11Fo4lpNQURSgCnA4wCqWquqB926njGm8/7v3BFER/r41cvW6eyFovIathVWkJHUhZMCMBAo
ABaIyEoReUxE2voKMklEVovIayIyqq0TichcEckVkdyCggIXQzame8pMjuVHZw7lvc8K+M8G63QO
ppr6Bq55ajlVtQ3MmTzA63BcTQqRwHjgIVU9HqgAbm11zAqgv6qOBR4AXmzrRKo6X1VzVDUnIyPD
xZCN6b5mTx7AsMxEfvHSekqr67wOp1tQVf530VpydxzgnovHMrZfD69DcjUp5AF5qrrUeb0Qf5Jo
pqqlqlruPH8ViBKRdBdjMsYcQlSEj7svGsO+0mrufGWj1+F0Cw++t4VFK/K56cxhnD8mNJaocS0p
qOpeYJeIDHc2nQFsaHmMiPQWEXGeT3TiKXIrJmPM4Y0/JpW5Uwbz99xdvPvZfq/D6dJeX7eH37/x
GReM7cMPzhjidTjN3B6ncCPwjIisAcYBvxGRa0XkWmf/TGCdiKwG7gcuVevlMsZTN00fyrDMRG59
fg0lldaM5Ja7XvuUUX2S+d3MMTjfjUOCq0lBVVc5fQFjVPUbqnpAVR9W1Yed/fNUdZSqjlXVk1T1
YzfjMca0LyYygj98ayyF5bXc/sqG9t9gjtiBilp2FFXy9bF9iI2K8DqcL7ERzcaYrxiT3YPrTxvM
8yvyeMvuRgq4tc7Kd2P6pngcyVdZUjDGtOnG04dybO8kbnl+DfkHq7wOp0tpSgqjLCkYY8JFdKSP
By8bT119I9c9vZzqOpsbKVDW5B1kYHqC55PftcWSgjHmkAZlJHLPxWNZk1fCL19a73U4XcbavBKO
C8FaAlhSMMa046xRvblh2mCe++8unlu20+twwl5heQ27S6pDsj8BLCkYYzrg5unDOXVoOrf9az2r
d9kUZkejqT9hdLYlBWNMmIrwCfdfejwZSTHc/I9V1NY3eh1S2FqbV4IIjOqT7HUobbKkYIzpkNSE
aG6fMYotBRX89ePtXocTttbklTAoPYGk2NDrZAZLCsaYI3DGiEymDc/gT29/zv7Saq/DCUtr8w8y
Jtv7ie8OxZKCMeaI3Pb1UdTWN3L3a596HUrY2V9azb7SGkaHaCczWFIwxhyhgekJfPfUgSxamU/u
9mKvwwkrzSOZQ7STGSwpGGM64funDyErJZbb/rWehkabw7Kj1uSV4BMYGaKdzGBJwRjTCfHRkfzs
vBFs2FPKk59s9zqcsLE2v4ShvZKIj470OpRDsqRgjOmU80ZnMW14Bnf+eyPvb7JlctujqqwJ4ZHM
TSwpGGM6RUS4f9bxDMtM4rqnl7POaS83bdtbWk1heU1I9yeAJQVjzFFIio1iwVUnkBofzZwF/2VX
caXXIYWsNXmhPZK5iSUFY8xRyUyO5a9Xn0BdQyOzFyzjQEWt1yGFpLV5JUT4hJFZodvJDJYUjDEB
MKRXEo/NziHvQBXfezLXptluw9r8EoZlJoXcSmutWVIwxgTECQN68seLx5K74wC3LFyDLbf+ZTuK
KhickeB1GO2ypGCMCZjzx/Thp18bzkurd3PvfzZ5HU7IUFX2lFTTp0ec16G0y9WkICI9RGShiHwq
IhtFZFKr/SIi94vIZhFZIyLj3YzHGOO+608bzMU52dz/zmaeX57ndTghobiilpr6RrJSYr0OpV1u
j6D4E/C6qs4UkWggvtX+c4ChzuNE4CHnpzEmTIkIv/7GaHYVV3HrojX06RHHpMFpXoflqT0l/skD
s1K6cU1BRFKAKcDjAKpaq6qtV+eYATypfkuAHiKS5VZMxpjgiI708fDlE+iflsDcp3LZtK/M65A8
tftgFQB9eoR+TcHN5qOBQAGwQERWishjItK6l6UvsKvF6zxn25eIyFwRyRWR3IICGzlpTDhIiY9i
wZwTiI2KYM4Ty9hb0n2n2raagl8kMB54SFWPByqAWztzIlWdr6o5qpqTkZERyBiNMS7q1zOeBXNO
oKSqjjkLllFWXed1SJ7YXVJFdISPtIRor0Npl5tJIQ/IU9WlzuuF+JNES/lAvxavs51txpgu4ri+
KTx0+QQ27y/nuqdXdMulPPccrKZ3Siw+n3gdSrtcSwqquhfYJSLDnU1nABtaHfYScKVzF9JJQImq
7nErJmOMN6YMy+Cub45m8eZCbv7Hqm433fbekuqwuPMI3L/76EbgGefOo63AVSJyLYCqPgy8CpwL
bAYqgatcjscY45Fv5fSjuKKWu177lNioCH530Ziw+OYcCLtLqjhhQE+vw+gQV5OCqq4CclptfrjF
fgVucDMGY0zouGbqYKrqGrjvrc+JjfJxx4zjEOnaiaGxUdlXajUFY4xp0w/PGEp1XSMPv7+FmMgI
fn7eiC6dGArLa6hrUEsKxhjTFhHhf84eTnVdA48v3kZCdAQ3nzW8/TeGqd1hdDsqWFIwxnhARPjF
10dSWVvP/e9sJiU+mu+cMtDrsFyxxxm4lhUGA9fAkoIxxiMiwl3fHENZdT13vLKBHnFRXDQh2+uw
Aq6pptAnTGoKNkuqMcYzET7hvkvHccqQdG55fg1vrt/rdUgBt+dgFbFRPnrER3kdSodYUjDGeCom
MoJHrpjA6L4pfP/Zlby9cZ/XIQXUnpJq+qTEhU1nuiUFY4znEmIi+ctVJzA8M4nvPZnLYx9u7TKL
9OwuqQqb/gSwpGCMCRE94qP5+zUncdbI3vz63xv5vxfWUdcQ/lNi7DlYHTZ3HoElBWNMCImPjuTB
y8Zz/WmDeXbZTuYsWEZJVfhOolff0Mj+smr6hMkYBbCkYIwJMT6fcMvZx3LPt8aybFsxlz22hAMV
tV6H1Sn7ympoVMgKg2U4m14zjf0AAA3eSURBVFhSMMaEpIsmZDP/ihw27Svn0vlLKCir8TqkI9Y8
RsFqCsYYc/SmHduLBXNOYGdxJZfM/yTsFuppHqNgNQVjjAmMk4ek8+R3JrK/tIaLH/mEHUUVXofU
YVZTMMYYF5wwoCdPf/dESqvruPDBj1m+44DXIXXInpJqEmMiSYoNj4FrYEnBGBMmxvXrwaLrJpMU
G8msR5fw7zWhvx7XnpKqsKolgCUFY0wYGZSRyKLrJjO6bwo3/G0FD7+/JaQHue0pqQ6rO4/AkoIx
JsykJcbwzHdP5LwxWdz92qf86uUNNIbo8p67D4bXGAWwWVKNMWEoNiqCBy49nl5JMSz4aDulVXX8
duYYoiJC53tuTX0DheU1YTWaGSwpGGPClM8n3Hb+SFLjo/njfzZRWl3HvG+PJzYqwuvQANhX4h9X
EU7zHoHLzUcisl1E1orIKhHJbWP/aSJS4uxfJSK3uRmPMaZrERF+cMZQ7pgxirc/3c/sJ0JnWozd
Jf7bUcNlHYUmwagpTFPVwsPs/1BVzw9CHMaYLuqKSQNIjoviJ/9czYUPfsQTs09gQHqCpzHtKQmv
FdeahE4DnDHGHIUZ4/ry9HdO5EBFLTP+/BGfbCnyNJ7dB8NrxbUmbicFBd4UkeUiMvcQx0wSkdUi
8pqIjGrrABGZKyK5IpJbUFDgXrTGmLB24qA0XrzhZDKSYrji8aU8t2ynZ7HsKamiR3wUcdGh0cfR
UW4nhVNUdTxwDnCDiExptX8F0F9VxwIPAC+2dRJVna+qOaqak5GR4W7Expiw1j8tgUXXT2bykHRu
XbSWn7+4ltr64K/LsKu4KuzuPAKXk4Kq5js/9wMvABNb7S9V1XLn+atAlIikuxmTMabrS46N4onZ
OVwzdRBPL9nJpUGeTC//YBUfbS5k8uC0oF0zUFxLCiKSICJJTc+Bs4B1rY7pLc7CpSIy0YnH24ZA
Y0yXEBnh43/PGcGfvz2eT/eWcf4Di1m6NTh/Xp5YvA0Frjp5QFCuF0hu1hQygcUishpYBvxbVV8X
kWtF5FrnmJnAOueY+4FLNZTHrBtjws55Y7J48YaTSY6N5LLHlvLy6t2uXq+kqo7nlu3k/DFZZKfG
u3otN7h2S6qqbgXGtrH94RbP5wHz3IrBGGMAhmUm8eL3T+Y7f/kvP3xuJVV1DVyc08+Vaz2zdAcV
tQ3MnTLIlfO7zW5JNcZ0C8mxUfz16omcPCSdWxau4S8fbQv4NWrqG1jw0XZOHZrOqD4pAT9/MFhS
MMZ0G/HRkTw2O4fpIzP55csb+PO7mwN6/n+t3E1BWU3Y1hLAkoIxppuJiYzgwcvGM2NcH37/xmfc
99amgJy3sVGZ/+FWRmYlc8qQ8L2J0ibEM8Z0O1ERPv548TiiInzc99bnNDYqN00fhnMzZKe8+9l+
Nu8v575Lxh3VebxmScEY0y1F+ITfXTQGn8D972ymQZWfnDW8U3/QK2rqufu1T+nbI47zxmS5EG3w
WFIwxnRbPp9w9zfHEOET/vzuFuoblVvPPvaIEoOqcuuitWwpKOfJq08MqTUdOsOSgjGmW/P5hDu/
MZoIn/DI+1spq67njhnHEeHrWGL4y8fbeXn1bn76teGcMjR8+xKaWFIwxnR7Pp9wx4zjSIqN4qH3
tnCwspZ7LxlHTOThJ7PL3V7Mnf/eyJkjMrlu6uAgResuSwrGGIN/wZ7/OftYesZHc+erGymtyuWR
KyZQW9/IZ/vK2LSvjIOVdaQlRpORGENSbBQ/fG4lfVPjuOfisfg6WLMIdZYUjDGmhe9NGURqQjT/
8/wacn79FlV1DYc8NjbKx1+vnkhKXFQQI3SXJQVjjGll5oRsMpJieH3dHgamJzC8dzLDM5PomRBN
cUUtBWU1FJRX0z8tgcEZiV6HG1CWFIwxpg1Th2UwddhX12/pnRJL75RYIDynsWhPeN87ZYwxJqAs
KRhjjGlmScEYY0wzSwrGGGOaWVIwxhjTzJKCMcaYZpYUjDHGNLOkYIwxppmoqtcxHBERKQB2tLEr
BSjp5Oum500/04HCTobY+jpHckxb2zsSd8vnLbe5WQ43y9DyeXf/LLwuQ8vnofJZ2O9258rRX1W/
OhqvNVXtEg9gfmdfNz1v8TM3UHEcyTFtbe9I3G2Vwe1yuFkG+yxCpwyh+FnY7/bRlaO9R1dqPnr5
KF6/fIhjAhHHkRzT1vaOxN3yeSDK0JHzuFmGjly/I7rCZ+F1GToaQ3sCWQ773XZR2DUfBYOI5Kpq
jtdxHK2uUI6uUAboGuWwMoQON8vRlWoKgTTf6wACpCuUoyuUAbpGOawMocO1clhNwRhjTDOrKRhj
jGlmScEYY0yzLp8UROQJEdkvIus68d4JIrJWRDaLyP0iIi323Sgin4rIehH5XWCj/kocAS+DiPxS
RPJFZJXzODfwkX8lFlc+C2f/j0VERSQ9cBG3GYcbn8UdIrLG+RzeFJE+gY/8K7G4UY7fO78Ta0Tk
BRHpEfjIvxSHG2X4lvM73SgirnVIH03shzjfbBH53HnMbrH9sL83bXLrXtdQeQBTgPHAuk68dxlw
EiDAa8A5zvZpwFtAjPO6VxiW4ZfAT8L9s3D29QPewD+oMT3cygAktzjmB8DD4fhZAGcBkc7z3wK/
DcMyjACGA+8BOaEWuxPXgFbbegJbnZ+pzvPUw5XzcI8uX1NQ1Q+A4pbbRGSwiLwuIstF5EMRObb1
+0QkC/8v6xL1/+s+CXzD2X0dcLeq1jjX2B+GZQg6F8txL3AL4PpdE26UQVVLWxyaQPiW401VrXcO
XQJkh2EZNqrqZ27GfTSxH8LXgP+oarGqHgD+A5zd2d//Lp8UDmE+cKOqTgB+AjzYxjF9gbwWr/Oc
bQDDgFNFZKmIvC8iJ7gabduOtgwA33eq+k+ISKp7oR7WUZVDRGYA+aq62u1AD+OoPwsRuVNEdgGX
Abe5GOvhBOL/VJOr8X8zDbZAliHYOhJ7W/oCu1q8bipPp8oZ2cGLdhkikghMBv7Zonkt5ghPE4m/
qnYScALwDxEZ5GRj1wWoDA8Bd+D/VnoHcA/+X+SgOdpyiEg88H/4my08EaDPAlX9GfAzEflf4PvA
LwIWZAcEqhzOuX4G1APPBCa6Dl83YGUItsPFLiJXAT90tg0BXhWRWmCbql4Y6Fi6XVLAXzs6qKrj
Wm4UkQhgufPyJfx/NFtWf7OBfOd5HrDISQLLRKQR/wRVBW4G3sJRl0FV97V436PAK24GfAhHW47B
wEBgtfOLlA2sEJGJqrrX5dibBOL/U0vPAK8S5KRAgMohInOA84EzgvUlqYVAfxbB1GbsAKq6AFgA
ICLvAXNUdXuLQ/KB01q8zsbf95BPZ8rpVkdKKD2AAbTo0AE+Br7lPBdg7CHe17qT5lxn+7XA7c7z
YfirbhJmZchqccxNwHPh+Fm0OmY7Lnc0u/RZDG1xzI3AwnD8LICzgQ1ARjDid/P/Ey53NHc2dg7d
0bwNfydzqvO8Z0fK2WZcwfrwvHoAzwJ7gDr83/C/g//b5evAauc/8W2HeG8OsA7YAszjixHg0cDT
zr4VwOlhWIangLXAGvzfnrLcLINb5Wh1zHbcv/vIjc/ieWf7GvyTnvUNx88C2Iz/C9Iq5+HqXVQu
leFC51w1wD7gjVCKnTaSgrP9aufffzNw1ZH83rR+2DQXxhhjmnXXu4+MMca0wZKCMcaYZpYUjDHG
NLOkYIwxppklBWOMMc0sKZguQUTKg3y9x0RkZIDO1SD+GVLXicjL7c0uKiI9ROT6QFzbmNbsllTT
JYhIuaomBvB8kfrF5G6uahm7iPwV2KSqdx7m+AHAK6p6XDDiM92L1RRMlyUiGSLyvIj813mc7Gyf
KCKfiMhKEflYRIY72+eIyEsi8g7wtoicJiLvichC8a8T8EzTfPTO9hznebkzod1qEVkiIpnO9sHO
67Ui8usO1mY+4YvJ/hJF5G0RWeGcY4ZzzN3AYKd28Xvn2J86ZVwjIr8K4D+j6WYsKZiu7E/Avap6
AnAR8Jiz/VPgVFU9Hv+MpL9p8Z7xwExVneq8Ph74ETASGASc3MZ1EoAlqjoW+AD4Xovr/0lVR/Pl
2Srb5MzRcwb+EeYA1cCFqjoe/xoe9zhJ6VZgi6qOU9WfishZwFBgIjAOmCAiU9q7njFt6Y4T4pnu
40xgZItZJ5Od2ShTgL+KyFD8s8RGtXjPf1S15Tz3y1Q1D0BEVuGfr2Zxq+vU8sWEgsuB6c7zSXwx
f/3fgD8cIs4459x9gY3458MH/3w1v3H+wDc6+zPbeP9ZzmOl8zoRf5L44BDXM+aQLCmYrswHnKSq
1S03isg84F1VvdBpn3+vxe6KVueoafG8gbZ/Z+r0i865Qx1zOFWqOs6ZCvwN4AbgfvxrK2QAE1S1
TkS2A7FtvF+Au1T1kSO8rjFfYc1Hpit7E/+sowCISNO0xCl8MYXwHBevvwR/sxXApe0drKqV+Jfj
/LGIROKPc7+TEKYB/Z1Dy4CkFm99A7jaqQUhIn1FpFeAymC6GUsKpquIF5G8Fo+b8f+BzXE6Xzfg
n/Ic4HfAXSKyEndryz8CbhaRNfgXRylp7w2quhL/bKmz8K+tkCMia4Er8feFoKpFwEfOLay/V9U3
8TdPfeIcu5AvJw1jOsxuSTXGJU5zUJWqqohcCsxS1Rntvc8YL1mfgjHumQDMc+4YOkiQlzs1pjOs
pmCMMaaZ9SkYY4xpZknBGGNMM0sKxhhjmllSMMYY08ySgjHGmGb/HyRou9sexXjkAAAAAElFTkSu
QmCC
" />
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>It is common to pick a point a bit before the suggested point.</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">max_lr</span> <span class="o">=</span> <span class="mf">5e-4</span>
</pre></div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p><strong>DRUM ROLL PLEASE!!!!!</strong> You are now going to finally start training your model! Specifically for 8 epochs because that was what was in the original code in the NLP course and it also happened to work the best during my training. However, you are also implementing a few call backs, namely automatically saving the best performing model, early stopping, and showing the training and validation loss graph. Since you are using early stopping, feel free to try out a higher epoch number and the training will stop once it starts not improving.</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="k">def</span> <span class="nf">train_model</span><span class="p">(</span><span class="n">learn</span><span class="p">,</span> <span class="n">epochs</span><span class="p">,</span> <span class="n">model_name</span><span class="p">,</span> <span class="n">max_lr</span> <span class="o">=</span> <span class="mf">5e-4</span><span class="p">):</span>
<span class="sd">"""Trains a model using save model, early stopping, and show graph call backs."""</span>
<span class="n">callback_fns</span> <span class="o">=</span> <span class="p">[</span>
<span class="n">callbacks</span><span class="o">.</span><span class="n">SaveModelCallback</span><span class="p">(</span>
<span class="n">learn</span><span class="p">,</span> <span class="n">every</span><span class="o">=</span><span class="s1">'improvement'</span><span class="p">,</span>
<span class="n">monitor</span><span class="o">=</span><span class="s1">'valid_loss'</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="sa">f</span><span class="s1">'</span><span class="si">{</span><span class="n">model_name</span><span class="si">}</span><span class="s1">_save_model'</span>
<span class="p">),</span>
<span class="n">callbacks</span><span class="o">.</span><span class="n">EarlyStoppingCallback</span><span class="p">(</span>
<span class="n">learn</span><span class="p">,</span> <span class="n">monitor</span><span class="o">=</span><span class="s1">'valid_loss'</span><span class="p">,</span> <span class="n">min_delta</span> <span class="o">=</span> <span class="mf">0.01</span><span class="p">,</span>
<span class="n">patience</span> <span class="o">=</span> <span class="mi">3</span>
<span class="p">),</span>
<span class="n">ShowGraph</span><span class="p">(</span><span class="n">learn</span><span class="p">)</span>
<span class="p">]</span>
<span class="n">learn</span><span class="o">.</span><span class="n">fit_one_cycle</span><span class="p">(</span><span class="n">epochs</span><span class="p">,</span> <span class="n">max_lr</span><span class="p">,</span> <span class="n">div_factor</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">callbacks</span> <span class="o">=</span> <span class="n">callback_fns</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">epochs</span> <span class="o">=</span> <span class="mi">8</span>
<span class="n">model_name</span> <span class="o">=</span> <span class="s1">'comment_gen'</span>
</pre></div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>Training on Google Colab can take anywhere from ~20 to 60 minutes depending on the type of GPU they give you. So, relax, get an IV caffeine drip going, and let your model cook in peace :).</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">train_model</span><span class="p">(</span><span class="n">learn</span><span class="p">,</span> <span class="n">epochs</span><span class="p">,</span> <span class="n">model_name</span><span class="p">,</span> <span class="n">max_lr</span> <span class="o">=</span> <span class="n">max_lr</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_html rendered_html output_subarea ">
<table border="1" class="dataframe">
<thead>
<tr style="text-align: left;">
<th>epoch</th>
<th>train_loss</th>
<th>valid_loss</th>
<th>accuracy</th>
<th>bleu</th>
<th>time</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1.182219</td>
<td>1.133453</td>
<td>0.828182</td>
<td>0.791774</td>
<td>06:46</td>
</tr>
<tr>
<td>1</td>
<td>0.920205</td>
<td>0.954264</td>
<td>0.841556</td>
<td>0.799681</td>
<td>06:47</td>
</tr>
<tr>
<td>2</td>
<td>0.812330</td>
<td>0.875513</td>
<td>0.849487</td>
<td>0.804000</td>
<td>06:44</td>
</tr>
<tr>
<td>3</td>
<td>0.752023</td>
<td>0.828835</td>
<td>0.853668</td>
<td>0.807183</td>
<td>06:45</td>
</tr>
<tr>
<td>4</td>
<td>0.679716</td>
<td>0.794862</td>
<td>0.856593</td>
<td>0.809325</td>
<td>06:43</td>
</tr>
<tr>
<td>5</td>
<td>0.653454</td>
<td>0.777795</td>
<td>0.859418</td>
<td>0.811010</td>
<td>06:42</td>
</tr>
<tr>
<td>6</td>
<td>0.611860</td>
<td>0.770059</td>
<td>0.860419</td>
<td>0.812164</td>
<td>06:49</td>
</tr>
<tr>
<td>7</td>
<td>0.605370</td>
<td>0.769881</td>
<td>0.860601</td>
<td>0.812119</td>
<td>06:45</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="output_area">
<div class="output_subarea output_stream output_stdout output_text">
<pre>Better model found at epoch 0 with valid_loss value: 1.133453130722046.
</pre>
</div>
</div>
<div class="output_area">
<div class="output_png output_subarea ">
<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAWoAAAD4CAYAAADFAawfAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz
AAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0
dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAgAElEQVR4nO3de3Cd9X3n8ff3nOdcdb/ZkiXjGwkY
jHGwwpLAUBqaFEjIZQKBTtJmaXfcTdMFMruzQ6ezTTLTnUk723Y3TZqUNPSym0ATJwzZNGkuLZTJ
hkBkMMZgAtjYkWRblmTdde7nt3+cR7Ysy9KRfI7Okfx5zWjOo+f6fXzkz3nO7/k9z2POOUREpHoF
Kl2AiIgsTEEtIlLlFNQiIlVOQS0iUuUU1CIiVc4rx0obmprdW7ZtLceqRUTWpH379g0559rmm1aW
oG7bsJGenp5yrFpEZE0ys2MXmlaepg/1zRYRKZmyBLVyWkSkdMoT1OVYqYjIJaosbdQ6ohaRpchk
MvT19ZFMJitdStlFo1G6uroIhUJFL1OeoNYxtYgsQV9fH3V1dWzevBkzq3Q5ZeOcY3h4mL6+PrZs
2VL0cmqjFpGKSyaTtLS0rOmQBjAzWlpalvzNQW3UIlIV1npIz1jOfpbpiFpRLSJSKrqEXEQueaOj
o/zVX/3Vkpe74447GB0dLUNF51IbtYhc8i4U1NlsdsHlvve979HY2Fiuss4oU68PEZHV46GHHuLw
4cPs2rWLUChENBqlqamJV199lddee40PfvCD9Pb2kkwmeeCBB9izZw8Amzdvpqenh8nJSW6//XZu
uukmfvrTn9LZ2ckTTzxBLBYrSX1l6ketqBaR5fns/32ZV46Pl3SdV22o59N3Xn3B6Z/73Oc4ePAg
+/fv56mnnuK9730vBw8ePNOF7pFHHqG5uZlEIsHb3/52PvzhD9PS0nLOOl5//XUeffRRvvKVr/CR
j3yEb33rW3zsYx8rSf06ohYRmeP6668/p5/z5z//eR5//HEAent7ef31188L6i1btrBr1y4Adu/e
zdGjR0tWT1FBbWafAv4DhQx+CbjPOXfBjoA6oBaR5VroyHel1NTUnBl+6qmn+PGPf8wzzzxDPB7n
lltumbcfdCQSOTMcDAZJJBIlq2fRk4lm1gncD3Q753YAQeDehZZR04eIrCZ1dXVMTEzMO21sbIym
pibi8TivvvoqP/vZz1a4uuKbPjwgZmYZIA4cX2jmvHJaRFaRlpYWbrzxRnbs2EEsFmP9+vVnpt12
2218+ctfZvv27VxxxRXccMMNK16fFXP0a2YPAP8dSAA/dM59dJ559gB7AGo7tu6eOH64xKWKyFp1
6NAhtm/fXukyVsx8+2tm+5xz3fPNX0zTRxPwAWALsAGoMbPzTmU65x52znU757qDXvF3hRIRkYUV
c8HLrwFvOucGnXMZ4NvAOxdaIK82ahGRkikmqH8J3GBmcSvcTeRW4NBCC6iNWkSkdBYNaufcs8Be
4HkKXfMCwMMLLZNTUouIlExRvT6cc58GPl3sSvPOkcnlCQV1zycRkYtVtiQdmUqXa9UiIpeUsgX1
sIJaRNaw2tpaAI4fP85dd9017zy33HILPT09F72tsgX1yfG1/5BKEZENGzawd+/esm6jbEHdP1K6
69xFRMrtoYce4otf/OKZ3z/zmc/wx3/8x9x6661cd911XHPNNTzxxBPnLXf06FF27NgBQCKR4N57
72X79u186EMfKtn9Pspy9zwD+kcV1CKyDN9/CE6+VNp1tl8Dt39uwVnuueceHnzwQT75yU8C8I1v
fIMf/OAH3H///dTX1zM0NMQNN9zA+9///gs+9/BLX/oS8XicQ4cOceDAAa677rqSlF+WoA4FAxxX
UIvIKvK2t72NU6dOcfz4cQYHB2lqaqK9vZ1PfepTPP300wQCAfr7+xkYGKC9vX3edTz99NPcf//9
AOzcuZOdO3eWpLayBbWaPkRkWRY58i2nu+++m71793Ly5Enuuecevva1rzE4OMi+ffsIhUJs3rx5
3luclltZ2qjDXoDekelyrFpEpGzuueceHnvsMfbu3cvdd9/N2NgY69atIxQK8eSTT3Ls2LEFl7/5
5pv5+te/DsDBgwc5cOBASeoqyxF12AswMJ4imckRDQXLsQkRkZK7+uqrmZiYoLOzk46ODj760Y9y
5513cs0119Dd3c2VV1654PKf+MQnuO+++9i+fTvbt29n9+7dJamrPEEdNLJA30iCy9fVlmMTIiJl
8dJLZ09ktra28swzz8w73+TkJFB4wO3BgwcBiMViPPbYYyWvqWxNHwC9p9X8ISJyscoU1IXmjmPD
U+VYvYjIJaUsQe0FjLqIx5EhBbWIFOdSedbqcvazbFcmbl1Xy+HByXKtXkTWkGg0yvDw8JoPa+cc
w8PDRKPRJS1XlpOJANtaa3jmyHC5Vi8ia0hXVxd9fX0MDg5WupSyi0ajdHV1LWmZ8gX1ulq+/UI/
U6ksNZGybUZE1oBQKMSWLVsqXUbVKubhtleY2f5ZP+Nm9uBiy21trQHgTbVTi4hclGIexfUL59wu
59wuYDcwDTy+2HLb/P7TaqcWEbk4Sz2ZeCtw2Dm38HWUwKaWOAGDw6cU1CIiF2OpQX0v8Oh8E8xs
j5n1mFnP4OAgES9IV1NcXfRERC5S0UFtZmHg/cA355vunHvYOdftnOtua2sDYHNrDUd10YuIyEVZ
yhH17cDzzrmBYhfY0hLn6ND0mu8bKSJSTksJ6t/gAs0eF7K5tYbJVJahST3oVkRkuYoKajOrAd4N
fHspK9/sd9FT84eIyPIVFdTOuSnnXItzbmwpK9/Sor7UIiIXq2z3+gDoaorhBYyjCmoRkWUra1B7
wQAbm+Nq+hARuQhlDWqAy5rj/FIPEBARWbYVCepjw+qiJyKyXGUP6k0tcSaSWcYSmXJvSkRkTSp7
UG9sjgOo+UNEZJlWpOkDFNQiIsu1YkF9bFhBLSKyHGUP6pqIR2ttmF4dUYuILEvZgxoK7dRq+hAR
WZ4VCepNfhc9ERFZuhUJ6sua45wYS5DO5ldicyIia8qKNX3kHRwfTazE5kRE1pSVafrw76J3TO3U
IiJLtmJNH6C+1CIiy7EiQb2uLkLYC6iLnojIMhT7hJdGM9trZq+a2SEze8eSNhKwwl301PNDRGTJ
vCLn+1/APzvn7vKfRh5f6oYua46rjVpEZBkWPaI2swbgZuCrAM65tHNudKkbuqw5Tu9p3e5URGSp
imn62AIMAn9rZi+Y2d/4D7s9h5ntMbMeM+sZHBw8byWXNceZTGUZmdbtTkVElqKYoPaA64AvOefe
BkwBD82dyTn3sHOu2znX3dbWdt5Kzt6cSY/lEhFZimKCug/oc8496/++l0JwL8llLeqiJyKyHIsG
tXPuJNBrZlf4o24FXlnqhjY2FYJaXfRERJam2F4f/wn4mt/j4whw31I3FAsHWVcX0c2ZRESWqKig
ds7tB7ovdmN6IrmIyNKtyJWJMy5riavpQ0RkiVY2qJvjnBhPksrmVnKzIiKr2ooHtXPQN6LbnYqI
FGtFg3qTuuiJiCzZigb1xmZ10RMRWaoVDeq22gixUFBd9ERElmBFg9rM1EVPRGSJVjSoodD8oaYP
EZHirXhQzxxR63anIiLFqUBQx5hO5xiaTK/0pkVEVqWVD2q/i17viJo/RESKUZGmD1AXPRGRYq14
UHf5tzvVg25FRIqz4kEdDRVud6oueiIixVnxoAbd7lREZCkqEtSdTTH6R3VjJhGRYhQV1GZ21Mxe
MrP9ZtZzsRvtaopxcixJNpe/2FWJiKx5xT6KC+BXnXNDpdhoZ2OcbN4xMJGiszFWilWKiKxZFWn6
6GoqhHO/7kstIrKoYoPaAT80s31mtme+Gcxsj5n1mFnP4ODggivr9IO6Txe9iIgsqtigvsk5dx1w
O/BJM7t57gzOuYedc93Oue62trYFVzbT3KEjahGRxRUV1M65fv/1FPA4cP3FbDQaCtJaG9EjuURE
irBoUJtZjZnVzQwD7wEOXuyGu9RFT0SkKMX0+lgPPG5mM/N/3Tn3zxe74c6mGC/3j13sakRE1rxF
g9o5dwS4ttQb7mqK8aOXB8jnHYGAlXr1IiJrRkW65wF0NcZI5/IMTaYqVYKIyKpQsaCe6aLXqxOK
IiILqtwRtX+7U51QFBFZWOWOqBt10YuISDEqFtQ1EY+meEh9qUVEFlGxoIZC84eCWkRkYRUN6o3N
MTV9iIgsouJH1P0jCZxzlSxDRKSqVTioY6SyeQYn1JdaRORCKtv04XfRU19qEZELq/gRNaiLnojI
QireRg2o54eIyAIqGtSxcJDW2rCOqEVEFlDRoAbobIrTe1pH1CIiF1LxoN7YpL7UIiILqXhQdzXF
6R9NkM+rL7WIyHyKDmozC5rZC2b23VIWsLE5RibnGJhIlnK1IiJrxlKOqB8ADpW6APX8EBFZWFFB
bWZdwHuBvyl1ATN9qXtPq51aRGQ+xR5R/0/gvwL5C81gZnvMrMfMegYHB4su4Ox9qXVELSIyn0WD
2szeB5xyzu1baD7n3MPOuW7nXHdbW1vRBURDQdbVRdTzQ0TkAoo5or4ReL+ZHQUeA95lZv+nlEVs
bFZfahGRC1k0qJ1zf+Cc63LObQbuBf7VOfexUhaxqTnOseGpUq5SRGTNqHg/aoAtrTUcH0uSSOcq
XYqISNVZUlA7555yzr2v1EVsaasB4KiOqkVEzlM1R9QARwYV1CIic1VZUE9WuBIRkepTFUEdD3ts
aIhyZEhH1CIic1VFUANsW1fLYR1Ri4icp3qCuq2Ww6cm9URyEZE5qiioa5hK5zilJ5KLiJyjaoJ6
a1stAIdPqflDRGS2qgnqbTNBrXZqEZFzVE1Qr6+PUBMOclh9qUVEzlE1QW1m6vkhIjKPqglqgK2t
Nbo6UURkjqoK6m1ttfSPJphOZytdiohI1aiuoF5XOKGoo2oRkbOqKqjfur4Q1K8NTFS4EhGR6lFV
Qb25pYawF+DVkwpqEZEZVRXUXjDAFevrOHRivNKliIhUjWIebhs1s+fM7EUze9nMPlvOgq5sr+PQ
CR1Ri4jMKOaIOgW8yzl3LbALuM3MbihXQds76hmaTDGoe36IiADFPdzWOedmrkIJ+T9lu8Xd9o56
AA72j5VrEyIiq0pRbdRmFjSz/cAp4EfOuWfnmWePmfWYWc/g4OCyC7p2YwPBgLHv2Miy1yEispYU
FdTOuZxzbhfQBVxvZjvmmedh51y3c667ra1t2QXFwx5XddTTc+z0stchIrKWLPUp5KPAk8Bt5Smn
YPemJl7sHSOby5dzMyIiq0IxvT7azKzRH44B7wZeLWdR125sIJHJ6U56IiIUd0TdATxpZgeAn1No
o/5uOYu6prMRgAN9o+XcjIjIquAtNoNz7gDwthWo5YytrTXURjwO9I1xd/fGldy0iEjVqaorE2cE
AsaOznoOqIueiEh1BjXAzq5GDp0YJ53VCUURubRVbVBf09lAOpvXnfRE5JJXtUG9s6sBgP29OqEo
Ipe2qg3qy5rjrKuL8OybuvBFRC5tVRvUZsY7t7XwzOEhnCvbrUVERKpe1QY1wDu3tTI0mea1AT2Z
XEQuXdUd1Je3APDTw0MVrkREpHKqOqi7muJc1hznp4eHK12KiEjFVHVQA7xzWws/OzJMLq92ahG5
NFV9UL9jWwsTyay66YnIJavqg/qWK9YRChrff+lEpUsREamIqg/qhliIm9/SxvdeOqFueiJySar6
oAZ4784Ojo8leUHNHyJyCVoVQf1rV60n4gX4Zk9vpUsREVlxqyKo66Mh7rx2A9998QTJTK7S5YiI
rKhiHsW10cyeNLNXzOxlM3tgJQqb6wO7NjCRyvLUL05VYvMiIhVTzBF1FvjPzrmrgBuAT5rZVeUt
63zv2NpCa22EJ/YfX+lNi4hU1KJB7Zw74Zx73h+eAA4BneUubC4vGOB9Ozv4l1dPMZbIrPTmRUQq
Zklt1Ga2mcLzE5+dZ9oeM+sxs57BwcHSVDfHXbu7SGfzPPrcL8uyfhGRalR0UJtZLfAt4EHn3Pjc
6c65h51z3c657ra2tlLWeMaOzgZuuryVR37yJqmsTiqKyKWhqKA2sxCFkP6ac+7b5S1pYf/xV7Zx
aiLF48/3V7IMEZEVU0yvDwO+Chxyzv15+Uta2I2Xt7Cjs56/fvqIbtQkIpeEYo6obwR+E3iXme33
f+4oc10XZGZ84lcu582hKX748slKlSEismK8xWZwzv0EsBWopWi37Whnc0ucL//bYW7b0U7hoF9E
ZG1aFVcmzhUMGL/7K9t4sW+Mb6mtWkTWuFUZ1AAf6d7I7k1NfO77h5hIql+1iKxdqzaogwHj03de
xdBkmi88+UalyxERKZtVG9QAO7sauWt3F3/9b0d48lXdA0RE1qZVHdQAf3jHdtrro9z3dz/nYP9Y
pcsRESm5VR/UTTVhvvJb3US8AP/+b39O/2ii0iWJiJTUqg9qgGu6Gvin+28ilcnxm199luMKaxFZ
Q9ZEUANcvq6OP/vItRwbnubOv/wJx4anKl2SiEhJrJmgBnjP1e08/nvvZDyZ4Xf+voeTY8lKlyQi
ctHWVFBDoSfIlz66m76Rae78wk949shwpUsSEbkoay6oofAw3O/8/k3EQkHuefhn/MG3XyKby1e6
LBGRZVmTQQ3w1vV1/NP9N/HbN27h0ed+ycf/9jlOT6UrXZaIyJKt2aAGqIuG+KM7r+JPP7yTnx8d
4c6//AkvH1dfaxFZXdZ0UM/4yNs38s3ffQe5vOMDX/h/PPDYC+zd10cyo6fEiEj1M+dKf/P97u5u
19PTU/L1XqzBiRR/8s+v8m/7X6Mjf4LJyDruuOEaLl/fyO3XtBPxgpUuUUQuUWa2zznXPd+0Re9H
vZa01UX4H3dfS/Kth4k+/t8AyD4T4BSNvPpEC8GGTrzGDUSaN7Kucws1rRuhfgPUbYBQtMLVi8il
atGgNrNHgPcBp5xzO8pfUvlFt90Ev/EYjPeTGOwlMNRLoP8I0dHXWTf6LPXHEvDCucvkIo1Q30mw
YQPUd0B9J9R1FIK8fkNhONYEeoiBiJRYMUfUfwd8AfiH8paygmrXwRW3A1Dn/7QD2Vyek+NJXjo+
QM9Lr/DG4deIJk7S5k7Tnh2hY/o0m06/SYc9T232NMacZiMv6od3ZyHMzxn2A712PQQvqS8yInKR
inkU19Nmtrn8pVSeFwzQ1RSnq2kLN169BXgvE8kML/aOMTCe5ODpaf7ilQFePzWBy2VYxyjtdpp2
O817NuZ5a2ycVnea+swgkd7nYOIElpvTJdACULPu3CPx+YbDNRX5NxCR6lPUyUQ/qL+7UNOHme0B
9gBcdtllu48dO1aiEqvTqYkkzx8bYXAixfcPnuTF3lGm0md7kQQDRi6fp4kJOuw03S0proxPcEV8
gg2BEZpzwzB+nODUCbz0+Hnrz0fqoW4Dgbr1EKkrBPeZn9rih4NhNceIrAILnUwsWVDPVq29Pspp
KpVlcCLFm8NT/PzN05hBwIypVCG8jwxNcrB/nKHJ1HnLxkjSbiO022nWM0KHnWa9nabdRtjgTdDo
pYmTJEaCcC5BKL+Ee5gEvEUCfRnTvBgELomenSIrRr0+VkBNxKMm4rG5tYZfvWLdvPM45xgYT/Fi
3yhvDk3R0RAlm3OEvQBhL0Dv6WkAxhMZjiQyjMTDfLN/jDeHp0hn85wYS5LLOwLkiZGiyUvT6KXp
qsmzqQ421uRJJSbYXOeoC6SI5hNEXZKoS1AXSOFlpyE9RSSTIJw6AempWT+T4Iq9zN7mhHkNhOKF
o3cvAsEIeOE5r5ElTovMWp//6kXPDgc8fVOQS4aCegWZGe0NUdob2pe1fDqb59REkkMnJjgyOMnx
0QTZvGNoMsV3+8fpP5ogHu5iOr34hTxtdRFSmRzpXJ7L19WCc4RJ41KF4J6eHKOGJJc3Gt0bwmyq
ddQFU7jUFF5ummBmiriliLkEwew02eQUwXyCWGCCEBksm4ZcCrL+Ty5deHWlusjI5g/3+QJ+Zh4v
CsEQBEL+q1d4DYb9cd7ZacXOF/AK42fPF/DnnTufPlhkmYrpnvcocAvQamZ9wKedc18td2FyvrA3
c7IzDqw/b3oykyMaCjKZyjKdypLO5UlmckylchwenCTvoDbi8capCV4+Pk5jPMTAeIp0Nk86m6d/
NMvVGzoJeQE2t8Spi4b4l0MDfOe1MZKZ4m9q1RQPsbWtlvaWKKPTaQbGU1zZXkdd1KOrIUx7bYB6
L0/QZQjk0gRdipGxSaYT06SSCeLBHB01AcJk6KwP0Bx2RCxLIDcT/rM+BGY+ALKp88fl0jA95f+e
9JdLQz4Duaz/mim8roSZUC/mQyHggQUL4R4IFk5CW3DWsP8TCPrzBc6dNnuZhaYtZ33nLGOAFV4t
cHb4nNfAPONY4vyz18/ytrmKXVJXJsryJDM5jo8mGE9mqQkHMTOy+TzjiSzjiQw552iMhUhkchwZ
nOLQiXGODk9xZHCK4ak0V7bXMZnKkszk522jny0aCpDO5snP7fkYMDqbYqQyebygkcnlecu6OtbV
RUjn8gyMJ3ltYJKwFyCZyVET9qiLeoSCAYIBoyEWoiEeIpXJkc07aiMe6+uj1EdDhILw1rYYW5rC
dNR5xINuVqBnIJ8tvObSZ4fPCfv0ufPNLHfefBdYx7wfHllwrvANxOUhn5s1nC+8upw/Pj9n2uxl
ZubLz5kvV8JvN1IK9tnxizuZuFQKarmQ0el0IeCTGVLZHLk8TCQz7OxqpCEWIuwFmEplOTmeJJHO
8drABIMTKU5Pp/nl8DSxUJDxZJaAwRunJknn8phBa22Ey5rjxMNBIl6QsUSGZCZHIpMjl3cMTaaZ
TGWoCXvk8o6TY0kmUtl5a9zaWkNnUwwvYNRFQwxPFb51xMMeU6nsmXMKBmxqqSEaChL2AkS8AO31
UdrqItREPJKZHGEvQH00RF3UI5PL0xgP0xALndmWc450Ls9EMktrbWSF3oVZnJvzIVBEuJ/zgeAA
N+s1P884N/+4BefPzxnHAvMvtA5/H6uew669R0EtMptzjlS28B85l3e8cmKcvpFpfjmc4ODxMU6O
JUllC81GjfEQES/AWCJDW12ETM6RzhaalXpHpsnmHDnnis6D1tqzYX1seJqs//VhXV2E5powyUyO
6XSOSCjAzq5GOhtjtNSESWfzTKazTCaz1EVDbGyOARD1Ch8UoWDhQ+65N08znszQEAvRUhsmHvYY
GE9iQDQUpK2u8IGQyuZJZfNEvAChYOFbx9a2WjoaoiQzOSJekLqoR94V/r3qYyFGptJk8o6pVJbj
owma4mF2djVgan+/aOr1ITKHmRENnb0J19s3N/P2zc1LXk8mlydgRjKTI+8cfSMJBidS5J3DCxSa
YYanCs09wUCAgfEkfSPTDIynMOCGrS3Ew0FqIyF6R6YZnU4TD3vEw0EmklkO9I3y41cGznyohL0A
8XCQqVSWTG7+T4baiEdHQ5ShyRRjiQx5V1guEgyQzuXPrKtU6qMeHQ0x/8Oq8A2hqzFOLBxkU0sc
L2AEzGiIh8jlHJOpLJ1NsTPnT5LZHDMX+U6nc/SNTBOPeCTTOeIRjw2NUd4YmMTMqI8VmqzCwQAB
MxrjISaSGbxg4RtN2AsQ9YIEA0ZtxKM+FiKbzxMOBoiHPdK5PE3xENFQkHQ2T9grrKdw3YMjk8uT
yzsa4iFqwh6j02lOT6VJZfMMTqSIh4N0NMRYVx8hGgoWvu1NpRlPZsjnCz24YuEgrbURJpNZhqdS
tNVGz8w/H+ccubltfXMoqEUuQihY6E9eEyn8V9reEWJ7R2m34Vwh3KKh4JntJTM5xhKFk6BTqSzZ
fOEoPxYO0tkYOycURqbS1MdCBAOFo95T40kiXhAvaMTDwcI3hFye4ckUR4enOT6aIB4uBNmk3zyU
zTnGk4VvFBEvQE3Eoz4a4sRYgv29o5waT+EFC4GXz8PxsQQj02mePTJM3kF+1jcYL2BnvkVA4eIw
z68tGgqyvj5CNu/86xCyDIwn2dpWixcwRvrTDE2mFw22lTAT7sVqiodo8M/lAEwks2c+5LOLrEdN
HyKyIqZSWbygEQoEGJ5KEw0VTvTGwwsfL2Zzebxg4LxxGf/ovC5aOFLO+R84iXSOnHNMJrOMJgpH
uoGAMZ3KEgwY48ksqWyOUCBAMps7880kHDTCXoBMzjGWyDCdztIUDxeO4L0ALTVhJvwL206MJklk
crTWhmlvKJyUDgaMdDbPdDrHwHiS2ohHa12Yock0A2NJTo4nOT2VPtOc1BgL4YCAQW0kxIPvfqua
PkSksma+dQBn2smLMTekZ8Z5QYiFC98cLtSssJo8uMA0XQcsIlLlFNQiIlVOQS0iUuUU1CIiVU5B
LSJS5RTUIiJVTkEtIlLlFNQiIlVOQS0iUuUU1CIiVa6ooDaz28zsF2b2hpk9VO6iRETkrEWD2syC
wBeB24GrgN8ws6vKXZiIiBQUc0R9PfCGc+6Icy4NPAZ8oLxliYjIjGLuntcJ9M76vQ/4d3NnMrM9
wB7/15SZHbz48iqqFRiqdBElsBb2Q/tQPdbCflTrPmy60ISS3ebUOfcw8DCAmfVc6L6qq8Va2AdY
G/uhfagea2E/VuM+FNP00Q9snPV7lz9ORERWQDFB/XPgLWa2xczCwL3Ad8pbloiIzFi06cM5lzWz
3wd+AASBR5xzLy+y2MOlKK7C1sI+wNrYD+1D9VgL+7Hq9qEsz0wUEZHS0ZWJIiJVTkEtIlLlShrU
q+lSczM7amYvmdl+M+vxxzWb2Y/M7HX/tckfb2b2eX+/DpjZdRWs+xEzOzW7n/py6jazj/vzv25m
H6+CffiMmfX778d+M7tj1rQ/8PfhF2b267PGV/Tvzcw2mtmTZvaKmb1sZg/441fN+7HAPqyq98PM
omb2nJm96O/HZ/3xW8zsWb+mf/Q7RGBmEf/3N/zpmxfbv4pyzpXkh8KJxsPAViAMvAhcVar1l/oH
OAq0zhn3p8BD/vBDwJ/4w3cA3wcMuAF4toJ13wxcBxxcbt1AM3DEf23yh5sqvA+fAf7LPPNe5f8t
RYAt/t9YsBr+3oAO4Dp/uOhCkVIAAAMqSURBVA54za931bwfC+zDqno//H/TWn84BDzr/xt/A7jX
H/9l4BP+8O8BX/aH7wX+caH9W8m/q/l+SnlEvRYuNf8A8Pf+8N8DH5w1/h9cwc+ARjPrqESBzrmn
gdNzRi+17l8HfuScO+2cGwF+BNxW/uoLLrAPF/IB4DHnXMo59ybwBoW/tYr/vTnnTjjnnveHJ4BD
FK7kXTXvxwL7cCFV+X74/6aT/q8h/8cB7wL2+uPnvhcz79Fe4FYzMy68fxVVyqCe71Lzhd7wSnPA
D81snxUufwdY75w74Q+fBNb7w9W+b0utu1r35/f9JoFHZpoLWCX74H91fhuFI7lV+X7M2QdYZe+H
mQXNbD9wisKH3WFg1DmXnaemM/X608eAFqpgP+ZzKZ9MvMk5dx2FuwJ+0sxunj3RFb4Hrbq+i6u1
buBLwDZgF3AC+LPKllM8M6sFvgU86Jwbnz1ttbwf8+zDqns/nHM559wuCldPXw9cWeGSSqaUQb2q
LjV3zvX7r6eAxym8sQMzTRr+6yl/9mrft6XWXXX745wb8P+j5YGvcPbrZlXvg5mFKATc15xz3/ZH
r6r3Y759WK3vB4BzbhR4EngHhealmQv7Ztd0pl5/egMwTBXtx2ylDOpVc6m5mdWYWd3MMPAe4CCF
emfOuH8ceMIf/g7wW/5Z+xuAsVlfbavBUuv+AfAeM2vyv9K+xx9XMXPa/D9E4f2Awj7c65+l3wK8
BXiOKvh789s0vwoccs79+axJq+b9uNA+rLb3w8zazKzRH44B76bQ3v4kcJc/29z3YuY9ugv4V//b
z4X2r7JKeWaSwlnt1yi0Df1hJc+SLlLnVgpndl8EXp6plUIb1b8ArwM/Bprd2TPKX/T36yWgu4K1
P0rhq2iGQvvZ7yynbuC3KZwoeQO4rwr24X/7NR6g8J+lY9b8f+jvwy+A26vl7w24iUKzxgFgv/9z
x2p6PxbYh1X1fgA7gRf8eg8Cf+SP30ohaN8AvglE/PFR//c3/OlbF9u/Sv7oEnIRkSp3KZ9MFBFZ
FRTUIiJVTkEtIlLlFNQiIlVOQS0iUuUU1CIiVU5BLSJS5f4/Sljlo1+EreoAAAAASUVORK5CYII=
" />
</div>
</div>
<div class="output_area">
<div class="output_subarea output_stream output_stdout output_text">
<pre>Better model found at epoch 1 with valid_loss value: 0.9542644023895264.
Better model found at epoch 2 with valid_loss value: 0.8755126595497131.
Better model found at epoch 3 with valid_loss value: 0.8288350701332092.
Better model found at epoch 4 with valid_loss value: 0.7948615550994873.
Better model found at epoch 5 with valid_loss value: 0.7777946591377258.
Better model found at epoch 6 with valid_loss value: 0.7700592279434204.
Better model found at epoch 7 with valid_loss value: 0.7698812484741211.
</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h1 id="Evaluate-your-model">Evaluate your model<a class="anchor-link" href="#Evaluate-your-model"> </a></h1>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>Let us now evaluated your trained model on some of your validation set so see how well your model is generating comments.</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<details class="description" open="">
<summary class="btn btn-sm" data-open="Hide Code" data-close="Show Code"></summary>
<p><div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="k">def</span> <span class="nf">get_predictions</span><span class="p">(</span><span class="n">learn</span><span class="p">,</span> <span class="n">ds_type</span><span class="o">=</span><span class="n">DatasetType</span><span class="o">.</span><span class="n">Valid</span><span class="p">):</span>
<span class="n">learn</span><span class="o">.</span><span class="n">model</span><span class="o">.</span><span class="n">eval</span><span class="p">()</span>
<span class="n">inputs</span><span class="p">,</span> <span class="n">targets</span><span class="p">,</span> <span class="n">outputs</span> <span class="o">=</span> <span class="p">[],[],[]</span>
<span class="k">with</span> <span class="n">torch</span><span class="o">.</span><span class="n">no_grad</span><span class="p">():</span>
<span class="k">for</span> <span class="n">xb</span><span class="p">,</span><span class="n">yb</span> <span class="ow">in</span> <span class="n">progress_bar</span><span class="p">(</span><span class="n">learn</span><span class="o">.</span><span class="n">dl</span><span class="p">(</span><span class="n">ds_type</span><span class="p">)):</span>
<span class="n">out</span> <span class="o">=</span> <span class="n">learn</span><span class="o">.</span><span class="n">model</span><span class="p">(</span><span class="o">*</span><span class="n">xb</span><span class="p">)</span>
<span class="k">for</span> <span class="n">x</span><span class="p">,</span><span class="n">y</span><span class="p">,</span><span class="n">z</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">xb</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span><span class="n">xb</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span><span class="n">out</span><span class="p">):</span>
<span class="n">inputs</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">learn</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">train_ds</span><span class="o">.</span><span class="n">x</span><span class="o">.</span><span class="n">reconstruct</span><span class="p">(</span><span class="n">x</span><span class="o">.</span><span class="n">cpu</span><span class="p">()))</span>
<span class="n">targets</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">learn</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">train_ds</span><span class="o">.</span><span class="n">y</span><span class="o">.</span><span class="n">reconstruct</span><span class="p">(</span><span class="n">y</span><span class="o">.</span><span class="n">cpu</span><span class="p">()))</span>
<span class="n">outputs</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">learn</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">train_ds</span><span class="o">.</span><span class="n">y</span><span class="o">.</span><span class="n">reconstruct</span><span class="p">(</span><span class="n">z</span><span class="o">.</span><span class="n">cpu</span><span class="p">()</span><span class="o">.</span><span class="n">argmax</span><span class="p">(</span><span class="mi">1</span><span class="p">)))</span>
<span class="k">return</span> <span class="n">inputs</span><span class="p">,</span> <span class="n">targets</span><span class="p">,</span> <span class="n">outputs</span>
<span class="n">inputs</span><span class="p">,</span> <span class="n">targets</span><span class="p">,</span> <span class="n">outputs</span> <span class="o">=</span> <span class="n">get_predictions</span><span class="p">(</span><span class="n">learn</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
</p>
</details>
</div>
<div class="cell border-box-sizing code_cell rendered">
<details class="description" open="">
<summary class="btn btn-sm" data-open="Hide Code" data-close="Show Code"></summary>
<p><div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="k">def</span> <span class="nf">print_results</span><span class="p">(</span><span class="n">inputs</span><span class="p">,</span> <span class="n">targets</span><span class="p">,</span> <span class="n">outputs</span><span class="p">,</span> <span class="n">method_spm</span><span class="p">,</span> <span class="n">comment_spm</span><span class="p">,</span> <span class="n">n</span> <span class="o">=</span> <span class="mi">10</span><span class="p">):</span>
<span class="sd">"""Just a little helper function for printing out the results from your model."""</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">n</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"Input:"</span><span class="p">,</span> <span class="s2">" "</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">decode_spec_tokens</span><span class="p">(</span><span class="n">method_spm</span><span class="o">.</span><span class="n">DecodePieces</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">inputs</span><span class="p">[</span><span class="n">i</span><span class="p">])</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s2">" "</span><span class="p">))</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s2">" "</span><span class="p">))),</span> <span class="s2">"</span><span class="se">\n</span><span class="s2">"</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"Target:"</span><span class="p">,</span> <span class="s2">" "</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">decode_spec_tokens</span><span class="p">(</span><span class="n">comment_spm</span><span class="o">.</span><span class="n">DecodePieces</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">targets</span><span class="p">[</span><span class="n">i</span><span class="p">])</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s2">" "</span><span class="p">))</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s2">" "</span><span class="p">))),</span> <span class="s2">"</span><span class="se">\n</span><span class="s2">"</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"Predicted:"</span><span class="p">,</span> <span class="s2">" "</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">decode_spec_tokens</span><span class="p">(</span><span class="n">comment_spm</span><span class="o">.</span><span class="n">DecodePieces</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">outputs</span><span class="p">[</span><span class="n">i</span><span class="p">])</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s2">" "</span><span class="p">))</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s2">" "</span><span class="p">))),</span> <span class="s2">"</span><span class="se">\n</span><span class="s2">"</span><span class="p">)</span>
<span class="n">print_results</span><span class="p">(</span><span class="n">inputs</span><span class="p">,</span> <span class="n">targets</span><span class="p">,</span> <span class="n">outputs</span><span class="p">,</span> <span class="n">method_spm</span><span class="p">,</span> <span class="n">comment_spm</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
</p>
</details>
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_subarea output_stream output_stdout output_text">
<pre>Input: xxbos @doesservicerequest private void putrangeinternal(final filerange range, final filerangeoperationtype operationtype, final byte[] data, final long length, final String md5, final accesscondition accesscondition, final filerequestoptions options, final operationcontext opcontext) throws storageexception { executionengine.executewithretry(this.fileserviceclient, this, putrangeimpl(range, operationtype, data, length, md5, accesscondition, options, opcontext), options.getretrypolicyfactory(), opcontext); } xxeos
Target: xxbos Used for both uploadrange and clearrange. xxeos
Predicted: xxbos Put to creating thes( s( xxeos
Input: xxbos public static byte[] encodesequence(byte[]... encodedvalues) { int length = 0; for (byte[] encodedvalue : encodedvalues) { length += encodedvalue.length; } byte[] lengthencoded = encodelength(length); bytearraydataoutput out = bytestreams.newdataoutput(1 + lengthencoded.length + length); out.write(sequence_tag); out.write(lengthencoded); for (byte[] entry : encodedvalues) { out.write(entry); } return out.tobytearray(); } xxeos
Target: xxbos Encodes a sequence of encoded values. xxeos
Predicted: xxbos Encodes a byte of bytes bytes into xxeos
Input: xxbos @override public String dnsresolveex(string host) { stringbuilder result = new stringbuilder(); try { inetaddress[] list = inetaddress.getallbyname(host); for (inetaddress inetaddress : list) { result.append(inetaddress.gethostaddress()); result.append("; "); } } catch (unknownhostexception e) { log.log(level.fine, "DNS name not resolvable {0}.", host); } return result.tostring(); } xxeos
Target: xxbos *********************************************************************** dnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveexdnsresolveex xxeosxxeosxxeosxxeosxxeosxxeosxxeosxxeosxxeosxxeosxxeosxxeosxxeosxxeosxxeosxxeosxxeosxxeosxxeosxxeosxxeosxxeosxxeosxxeosxxeosxxeosxxeosxxeosxxeosxxeosxxeosxxeosxxeosxxeosxxeosxxeosxxeosxxeosxxeosxxeosxxeosxxeosxxeosxxeosxxeosxxeosxxeosxxeosxxeosxxeosxxeosxxeosxxeosxxeosxxeosxxeosxxeosxxeosxxeosxxeosxxeosxxeosxxeosxxeosxxeosxxeosxxeosxxeosxxeosxxeosxxeos
Predicted: xxbos xxmap 51 * namepp =pxxeos
Input: xxbos protected void removeallfromattributevalueset() { final collection<abstracthtml5sharedobject> sharedobjects = getsharedobjects(); boolean listenerinvoked = false; final collection<writelock> writelocks = lockandgetwritelocks(); try { getattributevalueset().clear(); setmodified(true); invokevaluechangelisteners(sharedobjects); listenerinvoked = true; } finally { for (final Lock lock : writelocks) { lock.unlock(); } } pushqueues(sharedobjects, listenerinvoked); } xxeos
Target: xxbos clears all values from the value set. xxeos
Predicted: xxbos s all attribute from the xxeos
Input: xxbos public void registercheckwithnotes(string checkid, String name, String script, long interval, @suppresswarnings("sameparametervalue") String notes) { Check check = new Check(); check.setid(checkid); check.setname(name); check.setscript(script); check.setinterval(string.format(" ⁇ ss", interval)); check.setnotes(notes); registercheck(check); } xxeos
Target: xxbos Registers a Health Check with the Agent. xxeos
Predicted: xxbos Registers a xxupj ealth checkxxmaj check a givenxxmaj name xxeos
Input: xxbos public void assertequalsignoringcase(@nullable Description description, @nullable String actual, @nullable String expected) { if (!areequalignoringcase(actual, expected)) { String format = "expecting:< ⁇ s> to be equal to:< ⁇ s>, ignoring case considerations"; throw failures.failure(description, new basicerrormessagefactory(format, actual, expected)); } } xxeos
Target: xxbos Verifies that two s are equal, ignoring case considerations. xxeos
Predicted: xxbos Assert that the stringsxx are equal, oring the... xxeos
Input: xxbos protected cronschedulebuilder createcronschedulebuilder(string cronexpr) { int i = cronexpr.indexof("["); int j = cronexpr.indexof("]"); timezone timezone = defaulttimezone; if (i > -1 && j > -1) { timezone = timezone.gettimezone(cronexpr.substring(i+1, j)); cronexpr = cronexpr.substring(0, i).trim(); } return cronschedulebuilder.cronschedule(cronexpr).intimezone(timezone); } xxeos
Target: xxbos Allow timezone to be configured on a per-cron basis with [timezonename] appended to the cron format xxeos
Predicted: xxbos Create to to create used to the mtimei-s of a0],]. to the givenath.xxeos
Input: xxbos private <T> fakeencodeditem readnextitem(class<t> clazz) { fakeencodeditem item = data[dataposition]; if (item == null) { / / While Parcel will treat these as zeros, in tests, this is almost always an error. throw new unreliablebehaviorerror("reading uninitialized data at position " + dataposition); } checkconsistentreadandincrementposition(clazz, item); return item; } xxeos
Target: xxbos Reads a complete item in the byte buffer. xxeos
Predicted: xxbos Read the from the given array. xxeos
Input: xxbos private void hidesuggestionsifnecessary(final @nonnull querytoken currentquery, final @nonnull tokensource source) { String queryts = currentquery.gettokenstring(); String currentts = source.getcurrenttokenstring(); if (!iswaitingforresults(currentquery) && queryts != null && queryts.equals(currentts)) { msuggestionsvisibilitymanager.displaysuggestions(false); } } xxeos
Target: xxbos Hides the suggestions if there are no more incoming queries. xxeos
Predicted: xxbos Check the givenion of of the is no more . xxeos
Input: xxbos public list<uirow> getvalues() throws efapsexception { list<uirow> ret = new arraylist<>(); if (isfiltered()) { for (final uirow row : this.values) { boolean filtered = false; for (final tablefilter filter : this.filters.values()) { filtered = filter.filterrow(row); if (filtered) { break; } } if (!filtered) { ret.add(row); } } } else { ret = this.values; } setsize(ret.size()); return ret; } xxeos
Target: xxbos This is the getter method for the instance variable . xxeos
Predicted: xxbos Returns method a first method for the row of. xxeos
</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>This is great and all. However, you can see the text looks a bit off. Your model sort of starts generating some word and then switches half way through sometimes. This is because the way you are currently trying to generate tokens is using <a href="https://towardsdatascience.com/what-is-teacher-forcing-3da6217fed1c">Teacher Forcing</a>, which means you are giving the model the groundtruth for what it should have produced even if it did not. This is very helpful during training, however, it expects to have both the x and y of an input. In a real world setting, you aren't going to be given the y, obviously!</p>
<p>Therefore, I found a hacky way of bypassing this need for the y so that it is no longer needed. This involves using an empty array that fakes being the y, but has only ones and is updated everytime the model makes a prediction and is then fed back into the model so that is knows what it has generated before.</p>
<p><strong>Heads Up</strong> The way I coded this is extremely inefficient and so running it will take a long time to generate predictions. Therefore I recommend only generating a few comments (I set it up to only do 10).</p>
<p><strong>TODO For You:</strong> Come up with a more efficient solution that performs similarly to the Teacher Forcing approach of the above code.</p>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>P.S.</p>
<p>For other <a href="https://docs.fast.ai/text.learner.html#LanguageLearner.predict">language learners</a> provided by FastAI, you can simply use the <code>predict</code> function and pass some text and ask for the model to predict the next set of tokens. However, I have been unsuccessful in implementing this <code>predict</code> function for Sequence to Sequence models. So, another <strong>TODO For You</strong> is to see if you can implement a <code>predict</code> function for Sequence to Sequence models so that you can easily generate comments for methods that you just pass to the function!</p>
<p>If you do figure out a way to do this, I would be extremely interested! So, feel free to leave a comment about it.</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<details class="description" open="">
<summary class="btn btn-sm" data-open="Hide Code" data-close="Show Code"></summary>
<p><div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="k">def</span> <span class="nf">get_preds</span><span class="p">(</span><span class="n">learn</span><span class="p">,</span> <span class="n">db_tst</span><span class="p">,</span> <span class="n">max_seq</span> <span class="o">=</span> <span class="mi">128</span><span class="p">,</span> <span class="n">n</span> <span class="o">=</span> <span class="mi">10</span><span class="p">):</span>
<span class="n">learn</span><span class="o">.</span><span class="n">model</span><span class="o">.</span><span class="n">eval</span><span class="p">()</span>
<span class="n">inpts</span><span class="p">,</span> <span class="n">trgts</span><span class="p">,</span> <span class="n">preds</span> <span class="o">=</span> <span class="p">[],</span> <span class="p">[],</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="p">(</span><span class="n">xb</span><span class="p">,</span><span class="n">yb</span><span class="p">)</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">progress_bar</span><span class="p">(</span><span class="n">db_tst</span><span class="o">.</span><span class="n">dl</span><span class="p">(</span><span class="n">DatasetType</span><span class="o">.</span><span class="n">Train</span><span class="p">))):</span>
<span class="k">if</span> <span class="n">i</span> <span class="o">>=</span> <span class="n">n</span><span class="p">:</span> <span class="k">break</span>
<span class="n">res</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">xb</span><span class="p">[</span><span class="mi">0</span><span class="p">]),</span> <span class="n">max_seq</span><span class="p">,</span> <span class="n">device</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">device</span><span class="p">(</span><span class="s1">'cuda'</span><span class="p">))</span><span class="o">.</span><span class="n">long</span><span class="p">()</span> <span class="o">+</span> <span class="mi">1</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">max_seq</span> <span class="o">-</span> <span class="mi">1</span><span class="p">):</span>
<span class="n">outs</span> <span class="o">=</span> <span class="n">learn</span><span class="o">.</span><span class="n">model</span><span class="p">(</span><span class="n">xb</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">res</span><span class="p">)</span>
<span class="k">for</span> <span class="n">j</span><span class="p">,</span> <span class="n">out</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">outs</span><span class="p">):</span>
<span class="n">res</span><span class="p">[</span><span class="n">j</span><span class="p">][</span><span class="n">i</span> <span class="o">+</span> <span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="n">out</span><span class="o">.</span><span class="n">argmax</span><span class="p">(</span><span class="mi">1</span><span class="p">)[</span><span class="n">i</span><span class="p">]</span>
<span class="k">for</span> <span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">z</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">xb</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">yb</span><span class="p">,</span> <span class="n">res</span><span class="p">):</span>
<span class="n">inpts</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">learn</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">train_ds</span><span class="o">.</span><span class="n">x</span><span class="o">.</span><span class="n">reconstruct</span><span class="p">(</span><span class="n">x</span><span class="o">.</span><span class="n">cpu</span><span class="p">())))</span>
<span class="n">trgts</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">db_tst</span><span class="o">.</span><span class="n">train_ds</span><span class="o">.</span><span class="n">y</span><span class="o">.</span><span class="n">reconstruct</span><span class="p">(</span><span class="n">y</span><span class="o">.</span><span class="n">cpu</span><span class="p">())))</span>
<span class="n">preds</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">learn</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">train_ds</span><span class="o">.</span><span class="n">y</span><span class="o">.</span><span class="n">reconstruct</span><span class="p">(</span><span class="n">z</span><span class="o">.</span><span class="n">cpu</span><span class="p">())))</span>
<span class="k">return</span> <span class="n">inpts</span><span class="p">,</span> <span class="n">trgts</span><span class="p">,</span> <span class="n">preds</span>
<span class="n">inputs</span><span class="p">,</span> <span class="n">targets</span><span class="p">,</span> <span class="n">outputs</span> <span class="o">=</span> <span class="n">get_preds</span><span class="p">(</span><span class="n">learn</span><span class="p">,</span> <span class="n">db_tst</span><span class="p">)</span>
<span class="n">print_results</span><span class="p">(</span><span class="n">inputs</span><span class="p">,</span> <span class="n">targets</span><span class="p">,</span> <span class="n">outputs</span><span class="p">,</span> <span class="n">method_spm</span><span class="p">,</span> <span class="n">comment_spm</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
</p>
</details>
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_html rendered_html output_subarea ">
<div>
<style>
/* Turns off some styling */
progress {
/* gets rid of default border in Firefox and Opera. */
border: none;
/* Needs to be in here for Safari polyfill so background images work as expected. */
background-size: auto;
}
.progress-bar-interrupted, .progress-bar-interrupted::-webkit-progress-bar {
background: #F44336;
}
</style>
<progress value='10' class='' max='149', style='width:300px; height:20px; vertical-align: middle;'></progress>
6.71% [10/149 01:14<17:21]
</div>
</div>
</div>
<div class="output_area">
<div class="output_subarea output_stream output_stdout output_text">
<pre>Input: xxbos protected String parseunquotedstringcontent() { final int startndx = ndx; while (true) { final char c = input[ndx]; if (c <= ' ' || charutil.equalsone(c, UNQUOTED_DELIMETERS)) { final int currentndx = ndx; / / done skipwhitespaces(); return new String(input, startndx, currentndx - startndx); } ndx++; } } xxeos
Target: xxbos Parses un-quoted string content. xxeos
Predicted: xxbos Parses the text from the HTML text. xxeos
Input: xxbos private static void checkfilecopy(final File srcfile, final File destfile) throws ioexception { checkexists(srcfile); checkisfile(srcfile); if (equals(srcfile, destfile)) { throw new ioexception("files '" + srcfile + "' and '" + destfile + "' are equal"); } File destparent = destfile.getparentfile(); if (destparent != null && !destparent.exists()) { checkcreatedirectory(destparent); } } xxeos
Target: xxbos Checks that file copy can occur. xxeos
Predicted: xxbos Checks if the file is a file. xxeos
Input: xxbos long analyze() { Arc a; Arc aa; if (pre.outs == null) { return flags.reg_uimpossible; } for (a = pre.outs; a != null; a = a.outchain) { for (aa = a.to.outs; aa != null; aa = aa.outchain) { if (aa.to == post) { return flags.reg_uemptymatch; } } } return 0; } xxeos
Target: xxbos analyze - ascertain potentially-useful facts about an optimized NFA xxeos
Predicted: xxbos Returns the Syoooo Syna Syna Syna Sa Sa Sa Sa Sa Sa Sa Sa syna Syna xxeos
Input: xxbos @suppresswarnings("unchecked") public REC next() { checkdirection(true); orecord record; / / ITERATE UNTIL THE NEXT GOOD RECORD while (hasnext()) { / / FOUND if (currentrecord != null) { try { return (REC) currentrecord; } finally { currentrecord = null; } } record = gettransactionentry(); if (record != null) return (REC) record; } return null; } xxeos
Target: xxbos Return the element at the current position and move forward the cursor to the next position available. xxeos
Predicted: xxbos Returns the next record in the queue. xxeos
Input: xxbos public static void addtransitivematches(hollowreadstateengine stateengine, map<string, bitset> matches) { list<hollowschema> schemalist = hollowschemasorter.dependencyorderedschemalist(stateengine); collections.reverse(schemalist); for(hollowschema schema : schemalist) { bitset currentmatches = matches.get(schema.getname()); if(currentmatches != null) { addtransitivematches(stateengine, schema.getname(), matches); } } } xxeos
Target: xxbos Augment the given selection by adding the references, and the <i>transitive< / i> references, of our selection. xxeos
Predicted: xxbos Add a variable to the list of s. xxeos
Input: xxbos protected void resolvenestedproperties(final beanproperty bp) { String name = bp.name; int dotndx; while ((dotndx = indexofdot(name)) != -1) { bp.last = false; bp.setname(name.substring(0, dotndx)); bp.updatebean(getindexproperty(bp)); name = name.substring(dotndx + 1); } bp.last = true; bp.setname(name); } xxeos
Target: xxbos Resolves nested property name to the very last indexed property. If forced, <code>null< / code> or non-existing properties will be created. xxeos
Predicted: xxbos Resolve the property name. xxeos
Input: xxbos public xmlconfig declarenamespace(string prefix, String namespaceuri) { validate.notempty(prefix, "prefix cannot be empty"); validate.notempty(namespaceuri, "namespace URI cannot be empty"); map<string, String> updatednamespaces = new hashmap<string, string>(declarednamespaces); updatednamespaces.put(prefix, namespaceuri); return new xmlconfig(features, updatednamespaces, properties, validating, true, allowdoctypedeclaration, true); } xxeos
Target: xxbos Declares a namespace and also sets } to <code>true< / code>. <p / > <p>note that you cannot use this to add namespaces for the matcher. This has to be done by providing a to the matcher instance.< / p> xxeos
Predicted: xxbos Creates a new instance of the given name and the given name. xxeos
Input: xxbos protected static int getshadowradius(drawable shadow, Drawable circle) { int radius = 0; if (shadow != null && circle != null) { Rect rect = new Rect(); radius = (circle.getintrinsicwidth() + (shadow.getpadding(rect) ? rect.left + rect.right : 0)) / 2; } return Math.max(1, radius); } xxeos
Target: xxbos Calculates required radius of shadow. xxeos
Predicted: xxbos Get the Syyyy Syyy Syyy Syna Syna Syna Syna Syna Sna Syna Syna Syna xxeos
Input: xxbos void addadviceclinitmethod(final String name) { if (adviceclinits == null) { adviceclinits = new arraylist<>(); } adviceclinits.add(name); } xxeos
Target: xxbos Saves used static initialization blocks (clinit) of advices. xxeos
Predicted: xxbos Adds a Java class to the Saa Sa Sa Syyetch Bean. xxeos
Input: xxbos public static String padleft(string s, int desiredlength, String padstring) { while (s.length() < desiredlength) { s = padstring + s; } return s; } xxeos
Target: xxbos Pad the given string with padstring on the left up to the given length. xxeos
Predicted: xxbos Compares two strings, and returns the first character in the string. xxeos
</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>Not too shabby if I do say so myself! It seems to actually be learning about what a comment is supposed to have in it for documenting what the method is doing. Of course there are a lot of tweaks you could do such as adding the ability to generate inline comments instead of just method level, using more data, using different sampling schemes for generating the comments such as <a href="https://towardsdatascience.com/how-to-sample-from-language-models-682bceb97277">top-k or nucleus</a>, and any other awesome things you could think of! And if you do feel free to leave a comment about your adventure.</p>
<p><strong>Tip:</strong>
I have done a lot of fiddling to get this to work, however, most of my models ended up overfitting. The way I fixed this issue was just being more careful in how I clean the data and also increase the data size. I know this seems simple, but it is quite effective.</p>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h1 id="Conclusion">Conclusion<a class="anchor-link" href="#Conclusion"> </a></h1><p>In this tutorial, you created an Automatic Code Comment Generator! You learned about how to clean, explore, and process data and how to use the awesome Pytorch and FastAI to define and train the awesome Transformer architecture. The use of Deep Learning in the field of Software Engineering is what I am studying for my Ph.D., so I hope I have inspired you to think about some other ways you could use Deep Learning to help Software Engineering!</p>
<p>I hope you enjoyed this tutorial and look out for future blog posts from me about all kinds of topics as I have announced a challenge for myself for learning new things this year!
<center>
<div class="jekyll-twitter-plugin"><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Okay, I'm posing a <a href="https://twitter.com/hashtag/challenge?src=hash&ref_src=twsrc%5Etfw">#challenge</a> to myself: Each month devote an hour a day to learning about a subject I am weak at. At the end of the month, post a blog summarizing what you've learned.<br /><br />March is devoted to <a href="https://twitter.com/hashtag/neuroscience?src=hash&ref_src=twsrc%5Etfw">#neuroscience</a>! Already signed up for <a href="https://twitter.com/hashtag/edX?src=hash&ref_src=twsrc%5Etfw">#edX</a> course!<a href="https://twitter.com/hashtag/ChallengeAccepted?src=hash&ref_src=twsrc%5Etfw">#ChallengeAccepted</a> 😎</p>— Nathan Cooper (@ncooper57) <a href="https://twitter.com/ncooper57/status/1235408134904086529?ref_src=twsrc%5Etfw">March 5, 2020</a></blockquote>
<script async="" src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
</div>
</center>
</p>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h1 id="CodeSearchNet-citation">CodeSearchNet citation<a class="anchor-link" href="#CodeSearchNet-citation"> </a></h1>
<pre><code>
@article{husain_codesearchnet_2019,
title = {{CodeSearchNet} {Challenge}: {Evaluating} the {State} of {Semantic} {Code} {Search}},
shorttitle = {{CodeSearchNet} {Challenge}},
url = {http://arxiv.org/abs/1909.09436},
urldate = {2020-03-12},
journal = {arXiv:1909.09436 [cs, stat]},
author = {Husain, Hamel and Wu, Ho-Hsiang and Gazit, Tiferet and Allamanis, Miltiadis and Brockschmidt, Marc},
month = sep,
year = {2019},
note = {arXiv: 1909.09436},
}
</code></pre>
</div>
</div>
</div>
</div>
</p></div></div></div></div></div></div>Awesome Things I Learned Creating My Own Website2020-02-03T00:00:00-06:002020-02-03T00:00:00-06:00https://nathancooper.io/i-am-a-nerd/website/awesome/2020/02/03/Awesome-Things-I-Learned-Creating-My-Own-Website<p>Hello, Solar System! (As a space faring civilization, I feel it only customary we update our greetings to reflect such awesome accomplishments 🤓) I am a nerd and hopefully you are as well.</p>
<p>This post goes over many of the awesome technologies, resources, and overall tips and tricks I learned while creating my own personal website! This post is <strong>NOT</strong> a tutorial, mostly because there are tons of already existing ones on how to create a website and you don’t want to create <em>a website</em> or even <em>my website</em> (though mine is pretty awesome), you want you create <em>your own website</em>. For me this came from a lot of trial and error and tons of random google searches to fit some niche feature I wanted to add. So, this post is to centralize all of the niche features that went into my website in case any of you out there want to personalize some for your own website. This post is not really made to be gone through end to end, but rather for you to pick out the pieces that resonate best with you and give you inspiration for your own website.</p>
<p>Let’s get some of the boring stuff out of the way first, namely these are the main components that my website is made out of:</p>
<ul>
<li>
<p><a href="https://reactjs.org/">ReactJS</a> - using <a href="https://github.com/facebook/create-react-app">create-react-app</a> and</p>
</li>
<li>
<p><a href="https://material-ui.com/">Material-UI</a> - for the style points 😎 (Sadly not as delicious as brownie points)</p>
</li>
<li>
<p><a href="https://pages.github.com/">GitHub Pages</a> - for hosting the static site (ty GitHub <3), requires you use gh-pages and setup your create-react-app up correctly. Here is some <a href="https://create-react-app.dev/docs/deployment/#github-pages">documentation</a> on how that is done</p>
</li>
<li>
<p><a href="https://github.com/">GitHub</a> and GitHub Pages - for storing my projects and for storing the web demos of my projects (Sounds epic, right?!?!? More on this later)</p>
</li>
<li>
<p><a href="https://redux.js.org/">Redux</a> - for storing and updating state of ReactJS thingies like lists of current projects and blog posts (More on this later)</p>
</li>
<li>
<p><a href="https://github.com/rexxars/react-markdown">ReactMarkdown</a> - for rendering my blog posts, which, you guessed it, are markdown files!</p>
</li>
<li>
<p>Paperclips - I couldn’t afford duct tape :(</p>
</li>
</ul>
<h1 id="niche-features-and-tips">Niche Features and Tips</h1>
<h2 id="resume-of-coding-projects">Resume of Coding Projects:</h2>
<p>The core motivation behind my website was that I wanted a cool way to show off some of my projects that I’ve created over the years. I’ve seen how others create theirs and actually got inspired in part by <a href="https://nmarch213.github.io/Portfolio/#/projects">https://nmarch213.github.io/Portfolio/#/projects</a>. However, the issue is that these displays of projects need to be manually created and that was a problem for me because like most developers I am lazy and I don’t want to do that. I wanted a way where I could just create new projects and they would be automatically added in the correct format, including images, titles, descriptions, etc. I could just redirect users to my GitHub page, where I place all of my coding projects, but that seemed like a cop out and did not allow for any customization. So, like any good programmer, I made a scrapper that took the output from GitHub and converted into a format for my own usage 😊. This gave my website the ability to update the list of projects automatically as I added new repositories, which the title of each project is just determined by the repository’s name. To add an image to be displayed as the project’s logo, I just add an <code class="language-plaintext highlighter-rouge">icon.png</code> file to the root of the project and grab the icon from there when displaying the list.</p>
<p>To scrape all this information, I use the amazing <a href="https://developer.github.com/v3/">GitHub API v3</a> that GitHub provides. This API offers a ton of useful features, but for scrapping my projects I specifically used the <a href="https://developer.github.com/v3/repos/#list-user-repositories">Repositories API</a>. This also has information like the repository’s description, so you could include that information in your list of projects automatically if you so choose. The GitHub API v3 has a bunch of awesome functionality, another API I use is the <a href="https://developer.github.com/v3/repos/contents/#get-contents">Contents API</a> for listing out the different posts I have created (more on this later)! For integrating these APIs using ReactJS, I suggest using <a href="https://redux.js.org/">Redux</a> for storing the state, i.e., the projects and blog posts once they have returned from the GitHub API, and <a href="https://github.com/axios/axios">Axios</a> for actually making the HTTPS requests.</p>
<p>Besides just hosting one website using GitHub Pages, you can have one per repository. Incorporating this with the dynamic list of my coding projects, I am able to now create their own page that can serve as documentation or can even be a web demo! Currently I am not using this feature to the best of its ability, but I plan on overhauling my more put together projects so that they have at the very least some documentation using this awesome feature.</p>
<p>Overall, using very simple components offered by GitHub I am able to have my custom resume of projects that dynamically updates when I create a new one and links to documentation or a web demo of the project. Any updates to the actual project are reflected on the website without additional changes to some configuration file that contains all of the projects I have.</p>
<h2 id="blog">Blog:</h2>
<p>I now do all of my blogging using the new awesome <a href="https://github.com/fastai/fastpages">fastpages</a> library built by Hamel Husain and Jeremy Howard!</p>
<h3 id="deprecated">[Deprecated]</h3>
<p>Aside from being able to high-light my accomplishments, I wanted to be able to express myself on a multitude of topics. I never thought I would be a blogger, but a blog post by the awesome Rachel Thomas, <a href="https://medium.com/@racheltho/why-you-yes-you-should-blog-7d2544ac1045">Why you (yes, you) should blog</a>, inspired me to take it seriously. I tried with Medium, but I wanted something of my own and when I saw the equally awesome Jeremy Howard discussing his work on <a href="https://github.com/fastai/fast_template">Fast Template</a> that allows you to easily create your own personal blog using GitHub Pages and simple Markdown files I knew the time was now to commit. Now while I do not directly use Fast Template, because it is a bit too rigid for the amount of customizability I like to perform, I drew a lot of inspiration, namely writing blog posts as markdown files and having them stored statically on my website instead of on some weird MongoDB. To integrate this concept of using Markdown files I needed a way to render them easily using React, which is where I found this awesomely customizable library for doing just that called <a href="https://github.com/rexxars/react-markdown">React Markdown</a>. What’s nice about React Markdown is that each of the formatting components such as code snippets and headings are separated into their own rendering engine allowing you to swap out or customize them very easily. So since I quite enjoy dark theme, I found this awesome post by <a href="https://medium.com/young-developer/react-markdown-code-and-syntax-highlighting-632d2f9b4ada">Bexultan A. Myrzatayev</a> showing how you can use a custom react highlighting engine, I use <a href="https://github.com/conorhastings/react-syntax-highlighter">React Syntax Highlighter</a>, and swap out the default theme for code syntax highlighting for something like “atomOneDark”! (obviously I chose a dark theme, because dark theme is the only theme)</p>
<p>To write my posts, I don’t just write directly using Markdown as that would be extremely painful. I took the advice of Jeremy Howard again from his post on <a href="https://www.fast.ai/2020/01/18/gitblog/">Syncing your blog with your PC, and using your word processor</a> and used a word processor, in my case it is <a href="https://www.google.com/docs/about/">Google Docs</a>. Sadly, Google Docs does not allow you to directly export your document as a Markdown file, but thankfully an awesome person named Mangini created <a href="https://github.com/mangini/gdocs2md">gdocs2md</a> that does the conversion and handles grabbing the images and emails them to you!</p>
<h2 id="material-design">Material Design:</h2>
<p>As a nerd and mostly a computer nerd, my artistic skills are not the best. However, I wanted my website to look stylish, clean, modern, cool, hip…. (words I searched google for when trying to create a pretty website). This brought me to Google, they always create such chic looking websites and mobile apps, in my opinion obviously, so it wasn’t too surprising to learn that Google wrote the bo… well, website for designing GUI components, which they called <a href="https://material.io/">Material Design</a>. This is where Material-UI comes in. It is a complete React component implementation of the Material Design language and it looks smooootthhhhh. Using it is quite simple as laid out in there website, but I’m including a code snippet because I want to flex how my website is able to render code snippets courtesy of <a href="https://github.com/rexxars/react-markdown">React Markdown</a> and <a href="https://medium.com/young-developer/react-markdown-code-and-syntax-highlighting-632d2f9b4ada">Bexultan A. Myrzatayev</a>’s awesome post on how to change the theme used in code snippets :</p>
<p>Code snippet from what a project entry looks like:</p>
<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
<span class="k">import</span> <span class="p">{</span>
<span class="nx">Button</span><span class="p">,</span>
<span class="nx">Card</span><span class="p">,</span>
<span class="nx">CardActions</span><span class="p">,</span>
<span class="nx">CardContent</span><span class="p">,</span>
<span class="nx">CardMedia</span><span class="p">,</span>
<span class="nx">Typography</span>
<span class="p">}</span> <span class="k">from</span> <span class="dl">"</span><span class="s2">@material-ui/core</span><span class="dl">"</span><span class="p">;</span>
<span class="err">…</span><span class="p">.</span>
<span class="kd">const</span> <span class="nx">site</span> <span class="o">=</span> <span class="nx">has_pages</span> <span class="p">?</span> <span class="p">(</span>
<span class="o"><</span><span class="nx">Button</span> <span class="nx">variant</span><span class="o">=</span><span class="dl">"</span><span class="s2">contained</span><span class="dl">"</span> <span class="nx">color</span><span class="o">=</span><span class="dl">"</span><span class="s2">secondary</span><span class="dl">"</span> <span class="nx">href</span><span class="o">=</span><span class="p">{</span><span class="nx">pages_url</span><span class="p">}</span><span class="o">></span>
<span class="nx">View</span> <span class="nx">Site</span>
<span class="o"><</span><span class="sr">/Button</span><span class="err">>
</span>
<span class="p">)</span> <span class="p">:</span> <span class="kc">null</span><span class="p">;</span>
<span class="k">return</span> <span class="p">(</span>
<span class="o"><</span><span class="nx">Card</span> <span class="nx">className</span><span class="o">=</span><span class="p">{</span><span class="nx">classes</span><span class="p">.</span><span class="nx">card</span><span class="p">}</span> <span class="nx">fullWidth</span><span class="o">></span>
<span class="o"><</span><span class="nx">CardMedia</span>
<span class="nx">className</span><span class="o">=</span><span class="p">{</span><span class="nx">classes</span><span class="p">.</span><span class="nx">media</span><span class="p">}</span>
<span class="nx">image</span><span class="o">=</span><span class="p">{</span><span class="nx">icon_src</span><span class="p">}</span>
<span class="nx">onError</span><span class="o">=</span><span class="p">{</span><span class="nx">e</span> <span class="o">=></span> <span class="p">{</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="dl">"</span><span class="s2">cannot find icon</span><span class="dl">"</span><span class="p">);</span>
<span class="p">}}</span>
<span class="sr">/</span><span class="err">>
</span>
<span class="o"><</span><span class="nx">CardContent</span><span class="o">></span>
<span class="o"><</span><span class="nx">Typography</span> <span class="nx">gutterBottom</span> <span class="nx">variant</span><span class="o">=</span><span class="dl">"</span><span class="s2">headline</span><span class="dl">"</span> <span class="nx">component</span><span class="o">=</span><span class="dl">"</span><span class="s2">h4</span><span class="dl">"</span><span class="o">></span>
<span class="p">{</span><span class="nx">_</span><span class="p">.</span><span class="nx">startCase</span><span class="p">(</span><span class="nx">_</span><span class="p">.</span><span class="nx">camelCase</span><span class="p">(</span><span class="nx">name</span><span class="p">))}</span>
<span class="o"><</span><span class="sr">/Typography</span><span class="err">>
</span>
<span class="o"><</span><span class="sr">/CardContent</span><span class="err">>
</span>
<span class="o"><</span><span class="nx">CardActions</span><span class="o">></span>
<span class="p">{</span><span class="nx">site</span><span class="p">}</span>
<span class="o"><</span><span class="nx">Button</span> <span class="nx">variant</span><span class="o">=</span><span class="dl">"</span><span class="s2">contained</span><span class="dl">"</span> <span class="nx">color</span><span class="o">=</span><span class="dl">"</span><span class="s2">primary</span><span class="dl">"</span> <span class="nx">href</span><span class="o">=</span><span class="p">{</span><span class="nx">html_url</span><span class="p">}</span><span class="o">></span>
<span class="nx">View</span> <span class="nx">Repo</span>
<span class="o"><</span><span class="sr">/Button</span><span class="err">>
</span>
<span class="o"><</span><span class="sr">/CardActions</span><span class="err">>
</span>
<span class="o"><</span><span class="sr">/Card</span><span class="err">>
</span>
<span class="p">);</span>
<span class="err">…</span><span class="p">.</span>
</code></pre></div></div>
<h2 id="development">Development:</h2>
<p>I think programming is an invaluable skill that I am constantly learning to improve. So, I highly recommend those creating their own website to at least try to program it themselves. It is an adventure of pain and misery that I wish to inflict onto others, hahaha, haha, ha… But extremely rewarding when you finally see that beautiful glowing (please don’t make your website glow, it’s annoying) website plastered on your web browser (please let it be Chromium based or just basically not Explorer). All programmers need some system in which to develop whatever it is they care about creating. This is where things like Integrated Development Environments (IDEs), debuggers, and testing frameworks come in handy.</p>
<p>To keep myself sane, I spent a long, arduous, and tedious time trying and experimenting with different workflows for developing systems. And I have found the holy grail that has answered all of <em>my</em> questions and that allows for 10X greater productivity for *me *(your results will most certainly differ if you decide to use the same developmental setup). For me, I found the combination of <a href="https://code.visualstudio.com/">Visual Studio Code</a> and <a href="https://www.docker.com/">Docker</a> to be the textbook definition of perfection. In particular, the <a href="https://code.visualstudio.com/docs/remote/containers">Remote Container</a> extension that some genius made. This extension allows you to connect your vscode editor to a docker container’s file system. So, why is this so important to me? Well Docker allows you to spin up pretty much any environment you want such as a node server for hosting a ReactJS website :D, but most importantly it allows you to version and share these environments through Dockerfiles. This allows me to specify an environment per project, so I don’t have to maintain installing all of the dependencies that may conflict with each other on my local machine. This is why I use Docker for pretty much everything I do and I also quite enjoy using vscode, so being able to marry the two is absolute perfection!</p>
<h1 id="conclusion">Conclusion</h1>
<p>So, that concludes my first blog post! I hope you are able to use some of these awesome things I learned while creating my own website for your own. Keep a lookout for my future posts, I am planning on creating posts circulating around the following topics:</p>
<ul>
<li>
<p>Machine Language Processing (MLP) - like Natural Language Processing (NLP), but for computer nerds like us 🤓.</p>
</li>
<li>
<p>Automatic Code Comment Generation using Deep Learning.</p>
</li>
</ul>
<p>Also, feel free to contact me using my custom “Contact” system, which uses this awesome <a href="https://github.com/dwyl/learn-to-send-email-via-google-script-html-no-server">Google Script</a> for sending emails without the need of manually setting up a backend server, integrated into my website or on twitter <a href="https://twitter.com/ncooper57">@ncooper57</a> :).</p>Nathan CooperHello, Solar System! (As a space faring civilization, I feel it only customary we update our greetings to reflect such awesome accomplishments 🤓) I am a nerd and hopefully you are as well.