Atcold · EmaPajic · Oct 17, 2020 · Oct 17, 2020 · Oct 18, 2020 · Oct 18, 2020
diff --git a/docs/_config.yml b/docs/_config.yml
@@ -708,7 +708,13 @@ vi:
 ################################### Serbian ####################################
 sr:
   title: 'Duboko Učenje'
-
+  chapters:
+    - path: sr/week01/01.md
+      sections:
+        - path: sr/week01/01-1.md
+        - path: sr/week01/01-2.md
+        - path: sr/week01/01-3.md
+
 ################################### Bengali ####################################
 bn:
   title: 'ডীপ লার্নিং'
diff --git a/docs/sr/README-SR.md b/docs/sr/README-SR.md
@@ -43,9 +43,9 @@ source activate pDL
 ```
 
 
-## Startovati Jupyter Notebook ili JupyterLab
+## Pokrenuti Jupyter Notebook ili JupyterLab
 
-Startovati iz terminala:
+Pokrenuti iz terminala:
 
 ```bash
 jupyter lab

diff --git a/docs/sr/index.md b/docs/sr/index.md
diff --git a/docs/sr/week01/01-1.md b/docs/sr/week01/01-1.md
diff --git a/docs/sr/week01/01-2.md b/docs/sr/week01/01-2.md
@@ -0,0 +1,157 @@
+---
+lang: sr
+lang-ref: ch.01-2
+lecturer: Yann LeCun
+title: Evolucija, primene konvolucionih neuronskih mreža i zašto duboko učenje?
+authors: Marina Zavalina, Peeyush Jain, Adrian Pearl, Davida Kollmar
+date: 27 Jan 2020
+translation-date: 13 Dec 2020
+translator: Ema Pajić
+---
+
+<!-- Evolution of CNNs 
+-->
+## [Evolucija konvolucionih neuronskih mreža (CNN)](https://www.youtube.com/watch?v=0bMe_vCZo30&t=2965s)
+
+<!--In animal brains, neurons react to edges that are at particular orientations. Groups of neurons that react to the same orientations are replicated over all of the visual field.
+-->
+U mozgu životinja, neuroni reaguju na ivice koje su specifične orijentacije. Grupe neurona koje reaguju na istu orijentaciju nalaze se svuda po vidnom polju.
+
+<!--Fukushima (1982) built a neural net (NN) that worked the same way as the brain, based on two concepts. First, neurons are replicated across the visual field. Second, there are complex cells that pool the information from simple cells (orientation-selective units). As a result, the shift of the picture will change the activation of simple cells, but will not influence the integrated activation of the complex cell (convolutional pooling).
+-->
+Fukušima je 1982. godine napravio neuronsku mrežu koja je radila na isti način kao i mozak, bazirano na 2 koncepta. Prvo, neuroni su postavljeni po celom vidnom polju. Drugo, postoje kompleksne ćelije koje agregiraju informacije iz jednostavnih ćelija (jedinica koje reaguju na orijentaciju). Kao rezultat, pomeraj slike će uticati na aktivacije jednostavnih ćelija, ali neće uticati na agregiranu aktivaciju komplikovane ćelije (agregiranje konvolucijom).
+
+<!--LeCun (1990) used backprop to train a CNN to recognize handwritten digits. There is a demo from 1992 where the algorithm recognizes the digits of any style. Doing character/pattern recognition using a model that is trained end-to-end was new at that time. Previously, people had used feature extractors with a supervised model on top.
+-->
+LeCun je 1990. godine iskoristio propagaciju unazad da obuči konvolucionu neuronsku mrežu da prepozna rukom pisane cifre. Postoji video iz 1992. gde algoritam prepoznaje cifre napisane različitim stilovima. Prepoznavanje karaktera / oblika koristeći model koji rešava problem od početka do kraja je bilo novo u to vreme. Ranije je bilo neophodno izvlačenje obeležja pre modela nadgledanog učenja.
+
+<!--These new CNN systems could recognize multiple characters in the image at the same time. To do it, people used a small input window for a CNN and swiped it over the whole image. If it activated, it meant there was a particular character present.
+-->
+Novi CNN sistemi mogli su da prepoznaju više karaktera na slici istovremeno. To se radilo tako što je postojao mali prozor koji se pomerao po celoj slici i on je prosleđivan na ulaz modela. Ako se aktivira, to znači da je prisutan određeni karakter.
+
+<!--Later, this idea was applied to faces/people detection and semantic segmentation (pixel-wise classification). Examples include Hadsell (2009) and Farabet (2012). This eventually became popular in industry, used in autonomous driving applications such as lane tracking.
+-->
+Kasnije je ova ideja primenjena na detekciju lica / ljudi i semantičku segmentaciju (klasifikaciju piksela na slici). Primeri za to su Hadsel (2009) i Farabet (2012). Vremenom je ovo postalo popularno u industriji i koristi se, na primer, za praćenje trake na putu u autonomnoj vožnji.
+
+<!--Special types of hardware to train CNN were a hot topic in the 1980s, then the interest dropped, and now it has become popular again.
+-->
+Specijalan hardver za obučavanje konvolucionih neuronskih mreža je bila popularna tema 1980-ih, zatim je interesovanje opalo, ali se ponovo vratilo u skorije vreme.
+
+<!--The deep learning (though the term was not used at that time) revolution started in 2010-2013. Researchers focused on inventing algorithms that could help train large CNNs faster. Krizhevsky (2012) came up with AlexNet, which was a much larger CNN than those used before, and trained it on ImageNet (1.3 million samples) using GPUs. After running for a couple of weeks AlexNet beat the performance of the best competing systems by a large margin -- a 25.8% vs 16.4% top-5 error rate.
+-->
+Revolucija dubokog učenja (doduše, ovaj termin se nije koristio u to vreme) je počela 2010.-2013. Naučnici su se fokusirali na smišljanje algoritama koji bi mogli da ubrzaju treniranje velikih konvolucionih neuronskih mreža. Križevski je 2012. osmislio AlexNet, mnogo veću konvolucionu neuronsku mrežu nego što su ranije koriščene, i obučio je na ImageNet-u (skupu podataka sa oko 1.3 miliona odbiraka) koristeći GPU (Grafičku procesorsku jedinicu). Nakon obučavanja nekoliko nedelja, AlexNet je imao značajno bolje rezultate od najboljih rivalskih sistema -- 25.8% *vs.* 16.4% top-5 procenat greške.
+
+<!--After seeing AlexNet's success, the computer vision (CV) community was convinced that CNNs work. While all papers from 2011-2012 that mentioned CNNs had been rejected, since 2016 most accepted CV papers use CNNs.
+-->
+Nakon uspeha AlexNet-a, naučnici iz oblasti računarske vizije bili su ubeđeni da konvolucione neuronske mreže rade. Dok su svi radovi iz 2011.-2012. koji su pominjali CNN bili odbijeni, nakon 2016. najveći broj objavljenih radova koristi CNN.
+
+<!--Over the years, the number of layers used has been increasing: LeNet -- 7, AlexNet -- 12, VGG -- 19, ResNet -- 50. However, there is a trade-off between the number of operations needed to compute the output, the size of the model, and its accuracy. Thus, a popular topic now is how to compress the networks to make the computations faster.
+-->
+Vremenom se broj slojeva povećavao: LeNet -- 7, AlexNet -- 12, VGG -- 19, ResNet -- 50. Međutim, postoji kompromis između broja operacija potrebnog da se sračuna izlaz modela, veličine modela i njegove tačnosti. Iz tog razloga, trenutno popularna tema je kako kompresovati mreže da bi bile brže.
+
+
+<!-- Deep learning and feature extraction
+-->
+## [Duboko učenje i izvlačenje obeležja](https://www.youtube.com/watch?v=0bMe_vCZo30&t=3955s)
+
+<!--Multilayer networks are successful because they exploit the compositional structure of natural data. In compositional hierarchy,  combinations of objects at one layer in the hierarchy form the objects at the next layer. If we mimic this hierarchy as multiple layers and let the network learn the appropriate combination of features, we get what is called Deep Learning architecture. Thus, Deep Learning networks are hierarchical in nature.
+-->
+Višeslojne mreže su uspešne jer koriste kompozicionu strukturu podataka. U kompozicionoj hijerarhiji, kombinacije objekata na jednom sloju hijerarhije kreiraju objekte na sledećem sloju. Ako imitiramo tu hijerarhiju pomoću više slojeva i pustimo mrežu da uči odgovarajuću kombinaciju obeležja, dobijemo arhitekturu duboke neuronske mreže. Dakle, duboke neuronske mreže su prirodno hijerarhijske.
+
+<!--Deep learning architectures have led to an incredible progress in computer vision tasks ranging from identifying and generating accurate masks around the objects to identifying spatial properties of an object. Mask-RCNN and RetinaNet architectures mainly led to this improvement.
+-->
+Arhitekture dubokog učenja dovele su do neverovatnog napretka u računarskoj viziji, na raznim problemima, počevši od identifikacije i generacije tačnih "maski" objekata do identifikacije prostornih odlika objekta. Mask-RCNN i RetinaNet arhitekture su većinom dovele do ovog napretka.
+
+<!--Mask RCNNs have found their use in segmenting individual objects, i.e. creating masks for each object in an image. The input and output are both images. The architecture can also be used to do instance segmentation, i.e. identifying different objects of the same type in an image. Detectron, a Facebook AI Research (FAIR) software system, implements all these state-of-the-art object detection algorithms and is open source.
+-->
+Mask-RCNN mreže su pronašle primenu u segmentaciji pojedinačnih objekata, na primer kreiranju maske svakog objekta na slici. Ulaz i izlaz iz mreže su oba slike. Arhitektura takođe može da se primeni na segmentaciju instanci, tj identifikaciju različitih objekata istog tipa na slici. Detectron, softverski sistem Facebook AI Research (FAIR) centra, implementira sve najbolje algoritme detekcije objekata i open source-uje ih.
+
+<!--Some of the practical applications of CNNs are powering autonomous driving and analysing medical images.
+-->
+Neke od primena konvolucionih neuronskih mreža su i omogućavanje autonomne vožnje i analiza medicinskih slika.
+
+<!--Although the science and mathematics behind deep learning is fairly understood, there are still some interesting questions that require more research. These questions include: Why do architectures with multiple layers perform better, given that we can approximate any function with two layers? Why do CNNs work well with natural data such as speech, images, and text? How are we able to optimize non-convex functions so well? Why do over-parametrised architectures work?
+-->
+Iako je nauka i matematika iza dubokog učenja dosta dobro shvaćena, i dalje postoje zanimljiva pitanja koja treba istražiti. Na primer: Zašto arhitekture sa više slojeva rade bolje, uzevši u obzir da možemo da aproksimiramo funkciju pomoću 2 sloja? Zašto konvolucione neuronske mreže rade dobro sa prirodnim podacima kao što su govor, slike i tekst? Kako uspevamo da toliko dobro optimizujemo nekonveksne funkcije? Zašto arhitekture sa previše parametara rade?
+
+<!--Feature extraction consists of expanding the representational dimension such that the expanded features are more likely to be linearly separable; data points in higher dimensional space are more likely to be linearly separable due to the increase in the number of possible separating planes.
+-->
+Izdvajanje odlika sastoji se od proširivanja dimenzije reprezentacije tako da proširena obeležja verovatnije budu linearno separabilna, tačke u prostoru više dimenzije su verovatnije linearno separabilne zbog povećanja broja potencijalnih separacionih ravni.
+
+<!--Earlier machine learning practitioners relied on high quality, hand crafted, and task specific features to build artificial intelligence models, but with the advent of Deep Learning, the models are able to extract the generic features automatically. Some common approaches used in feature extraction algorithms are highlighted below:
+-->
+Ranije se u primenama mašinskog učenja oslanjalo na kvalitetne, ručno odabrane odlike, specifične za zadatak. Zbog napretka dubokog učenja, modeli su u mogućnosti da automatski izdvoje obeležja. Neki od pristupa korišćenih u izdvajanju odlika:
+
+<!--- Space tiling -->
+<!--- Random Projections-->
+<!--- Polynomial Classifier (feature cross-products)-->
+<!--- Radial basis functions-->
+<!--- Kernel Machines-->
+
+- Popločavanje prostora
+- Nasumične projekcije
+- Polinomijalni klasifikator (vektorski proizvodi obeležja)
+- Funkcije radijalne baze
+- Kernel mašine
+
+<!--Because of the compositional nature of data, learned features have a hierarchy of representations with increasing level of abstractions. For example:
+-->
+Zbog kompozitne prirode podataka, naučena obeležja imaju hijerarhiju reprezentacija sa rastućim nivoem apstrakcije. Na primer:
+
+<!---  Images - At the most granular level, images can be thought of as pixels. Combination of pixels constitute edges which when combined forms textons (multi-edge shapes). Textons form motifs and motifs form parts of the image. By combining these parts together we get the final image.
+-->
+-  Slike - Na najmanjem nivou, slike su pikseli. Kombinacija piksela čini ivice, dok kombinacija ivica čini tekstone (oblike sa više ivica). Tekstoni čine motive, a motivi čine delove slike. Kombinacijom delova slike dobijamo celu sliku.
+<!---  Text - Similarly, there is an inherent hierarchy in textual data. Characters form words, when we combine words together we get word-groups, then clauses, then by combining clauses we get sentences. Sentences finally tell us what story is being conveyed.
+-->
+-  Tekst - Slično, postoji inherentna hijerarhija u tekstualnim podacima. Karakteri formiraju reči, reči formiraju grupe reči, zatim klauzule, a kombinacijom klauzula dobijamo rečenice. Rečenice čine priču koja je zapisana.
+<!---  Speech - In speech, samples compose bands, which compose sounds, which compose phones, then phonemes, then whole words, then sentences, thus showing a clear hierarchy in representation.
+-->
+-  Govor - U govoru, od zamisli, preko aparata za govor, do glasova i fonema, zatim celih reči i na kraju rečenica, takođe vidimo jasnu hijerarhiju.
+
+
+<!--Learning representations
+-->
+## [Učenje reprezentacija](https://www.youtube.com/watch?v=0bMe_vCZo30&t=4767s)
+
+<!--There are those who dismiss Deep Learning: if we can approximate any function with 2 layers, why have more?
+-->
+Ne žele svi da prihvate duboko učenje: ako možemo da aproksimiramo bilo koju funkciju pomoću 2 sloja, zašto koristiti više?
+
+<!--For example: SVMs find a separating hyperplane "in the span of the data," meaning predictions are based on comparisons to training examples. SVMs are essentially a very simplistic 2 layer neural net, where the first layer defines "templates" and the second layer is a linear classifier. The problem with 2 layer fallacy is that the complexity and size of the middle layer is exponential in N (to do well with a difficult task, need LOTS of templates). But if you expand the number of layers to log(N), the layers become linear in N. There is a trade-off between time and space.
+-->
+Na primer: SVM nalazi separacionu hiperravan "preko podataka", tj. predikcije su bazirane na poređenjima sa podacima iz obučavajućeg skupa. SVM je u suštini veoma jednostavna dvoslojna neuronska mreža, gde prvi sloj definiše "šablone", a drugi sloj je linearni klasifikator. Problem sa dvoslojnom mrežom je to što je kompleksnost i veličina srednjeg sloja eksponencijalna po $N$ (da bi dobro radila na teškom zadatku, potrebno je PUNO šablona). Međutim, ako proširimo broj slojeva na $\log(N)$, slojevi postaju linearni po $N$. Postoji kompromis između vremena i prostora. 
+
+<!--An analogy is designing a circuit to compute a boolean function with no more than two layers of gates - we can compute **any boolean function** this way! But, the complexity and resources of the first layer (number of gates) quickly becomes infeasible for complex functions.
+-->
+Analogija je dizajniranje kola koje računa bulovu funkciju sa ne više od dva sloja kapija -- možemo da sračunamo **bilo koju bulovu funkciju** na ovaj način! Međutim, kompleksnost i resursi prvog sloja (broj kapija) brzo postaju nepraktični za kompleksne funkcije.
+
+<!--What is "deep"?
+-->
+Šta je "duboko"?
+
+<!-- - An SVM isn't deep because it only has two layers
+- A classification tree isn't deep because every layer analyses the same (raw) features
+- A deep network has several layers and uses them to build a **hierarchy of features of increasing complexity**
+-->
+- SVM nije dubok jer ima samo 2 sloja
+- Klasifikaciono stablo nije duboko jer svaki sloj analizira ista obeležja
+- Duboka neuronska mreža ima više slojeva i koristi ih da napravi **hijerarhiju obeležja rastuće kompleksnosti**
+
+<!--How can models learn representations (good features)?
+-->
+Kako modeli uče reprezentacije (dobra obeležja)?
+
+<!--Manifold hypothesis: natural data lives in a low-dimensional manifold. Set of possible images is essentially infinite, set of "natural" images is a tiny subset. For example: for an image of a person, the set of possible images is on the order of magnitude of the number of face muscles they can move (degrees of freedom) ~ 50. An ideal (and unrealistic) feature extractor represents all the factors of variation (each of the muscles, lighting, *etc.*).
+-->
+Manifold hipoteza: prirodni podaci su u nisko-dimenzionom prostoru. Skup mogućih slika je u suštini beskonačan, ali je skup "prirodnih" slika mali podskup. Na primer: Za sliku osobe, skup mogućih slika je reda veličine broja mišića lica koji mogu da se pomere (stepeni slobode) ~ 50. Idealan (i nerealističan) izdvajač obeležja reprezentuje sve faktore (svaki mišić, svetlost, itd.)
+
+<!--Q&A from the end of lecture:
+-->
+Pitanja i odgovori sa kraja lekcije:
+
+<!--- For the face example, could some other dimensionality reduction technique (*i.e.* PCA) extract these features?
+-->
+<!--  - Answer: would only work if the manifold surface is a hyperplane, which it is not
+-->
+- Za primer lica, da li bi neka druga tehnika redukcija dimenzija (npr. PCA) uspela da izvuče ta obeležja? 
+  - Odgovor: to bi radilo samo ako je površina manifolda hiperravan, što nije.