Posts

Showing posts from 2017

A Basic Recipe for Machine Learning

Image
Ever since wrapping up the three Deep Learning courses by Andrew Ng I've been meaning to write down some of the gems that he's highlighted throughout the course. One of the nice ones that I felt needed to be written down is his general recipe to approaching a deep learning algorithm/model. I've basically summarized it in a flowchart below (because everybody loves a flowchart right?) Basic deep learning recipe What is bias and variance? The below diagram is the typical explanation that I'm sure most of us are used to.  Graphical illustration of bias and variance How can we know if we have high bias or high variance?  For high bias, we could take a look at the training set performance. A poor performance is an indicator of a poor model fit, and signals that we could try to apply a bigger network to get a better fit of the model. For high variance, we could take a look at the validation set (or dev set - as Andrew calls it) performance.

Reviewing Andrew Ng's Deep Learning Course: Neural Network and Deep Learning

Image
Feeling rather good about myself as I'm writing this as I've just completed the first course of Andrew Ng's latest Deep Learning specialization on Coursera. I've been meaning to learn about Deep Learning for quite awhile now but haven't been able to wrap my heads around the theory aspect of it for longest of time. Previously, my foray into deep learning has been via Udacity's Deep Learning materials, random internet articles, and the Deep Learning textbook. Yes. THE textbook.  Bought it from Amazon a few months ago, and am still going through the pages. Still finding it tough to find the time between going through a few pages, the day job, and sorting out the kids at night. From what I've gone through so far, I'd imagine that I would need to brush up on my rusty math in order to be able to fully appreciate the book. I have a confession to make though. I never really did go through Andrew Ng's first ML course (gasps!)

Setting Up Docker for Windows 10

Image
Didn't really had any use for Docker until today. Was trying to follow a course via Safari Online, and long story short - I'd probably need a docker to simplify setting up all the infra. Except setting Docker itself turned out to be quite a problem for me. Here's how I got it to work. 1. Downloaded Docker (community edition) from their website (https://www.docker.com/) 2. Installed it. 3. Checked whether hyper-V is enabled. ( Go to task manager -> Performance -> CPU and you should see as section as "Virtualization : Enabled") [1] 4. Open up PowerShell 5. Use 'docker-machine create ' to create a virtual machine. I named mine 'box' 6. Configure your shell (refer to image) Reference:  [1] : https://stackoverflow.com/questions/40459280/docker-cannot-start-on-windows [2]: https://docs.docker.com/docker-for-windows/#explore-the-application-and-run-examples

Book Review: Weapons of Math Destruction (Cathy O'Neil)

Image
This post marks my first attempt in trying to force myself to gain a better understanding of the books that I've read. Previously, I find myself reading books after books without being able to recall the important things that I've learned earlier. It's rather frustrating to be honest. So I'm trying this out as a way for me to push myself to understand the book and synthesize the various concept and ideas that are conveyed from the book. A disclaimer: My reviews will not attempt to be neutral or unbiased - as I feel that any attempt for me to try and write such kind of a blog post would result in a dry and boring outcome. Guess you could say that it'd probably be much more of a rant rather than review. Moving on. I bought the book from Amazon quite awhile back in April and it has been on the shelf for quite sometime as I was another book at that time. The outline is rather interesting, as it highlights the pitfalls of big data implementation from a f

A Retrospective Look On What it Means To Be A Data Scientist

Image
The Sexiest Job - HBR I've talked about this subject in some of my posts in my earlier years of working as a data scientist, namely in these 2 blog posts: 1.  Journey In Data Science 2.  Hindsight, 8 Months Down The Analytics Road So now 3 years down the road, I guess I am a little more knowledgeable on the matter, a little bit wiser. Back to that definition I was talking about, recently there has been two articles which I think provides a good description of what are the skills needed to become a data scientist, and what are the role that a data scientist play in a day to day setting. In the final half of this post, I'll include my 2 cents on the articles and how it relates to my daily work. The Skills [1] Picking it up from Forbes (which in turn picked it up from Quora), the top 5 skills are: 1. Programming.  I guess this is pretty much a no brainer. Programming skills do come in handy especially when you're trying to (1) massage data, and (2) automa

Research Sample Size

Image
Sometimes part of being a data scientist requires that you actually do act like a "scientist" (obviously). In this post, we're going to have a look at a "not-so" popular subject of determining the right sample size that allows you to make a proper conclusion with respect to the population that you're interested in. More often than not, people usually assume that a sample size needs to bear some proportional relationship to the size of the population from which it is drawn. This not necessarily be the case. Rather, at some point, having more samples need not mean a greater accuracy in doing your analysis. What this means is, you really don't need to gather as much samples as possible in order to come up with a reasonable conclusion that can be applied to the population at large. The absolute size of a sample is much more important. The size is pretty much dependent on the variation in the population parameters under study and the amount of e

Setting Up Tensorflow (with CUDA) for Windows 10

Image
Below are some of my rough notes on how I've setup my Windows 10 laptop to use Tensorflow with CUDA. My reference: http://www.heatonresearch.com/2017/01/01/tensorflow-windows-gpu.html https://www.tensorflow.org/install/ Install the following NVidia drivers: CUDA Drivers (http://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows). Currently it's CUDA Toolkit 8.0. Can install using the installer downloaded. CUDNN - CUDA for Deep Neural Networks (https://developer.nvidia.com/cudnn). Currently it's 5.1. Once extracted, place the files in the respective directory along with the other CUDA files in the NVIDIA Toolkit folder Setting up Tensorflow (CPU) Setting up Tensorflow (GPU) Note: It's 22/2/2017 now and Google have recently released their Tensorflow 1.0, which might've rendered the above guide obsolote (i've haven't tested them yet). Update (23/2/2017): The above basically creates 2 new anaconda instance for you to pla