Posts

Showing posts from 2014

Assign select result to variable in Netezza stored procedure

Image
Now THAT is a lengthy title for a blog post. Am currently working on stored procedure to calculate Dijkstra's shortest path when I ran into this problem (as stated above). Looked through Netezza's Stored Procedure guide but couldn't find anything of use (perhaps I was not looking hard enough. Unfortunately for me even more when most of the SQL-variants out there also couldn't point me in the right direction (even PostgresSQL!) I blame you for not being able to sleep tonight! After examining the error code in Aginity multiple times, I tried to infer that the INTO probably had to be put after the statement, since the error message was complaining something about not being able to do select a variable before doing an INTO. So what if the variable was put after the INTO? Maybe even after the whole statement itself. DECLARE vID varchar; vESTIMATE integer; ... ... select '5','8'--id , estimate, from  ( select row_number

Visualizing Social Network - Part 2

Image
In Part 1, I wrote mostly about how I ended up working in SNA and I touched a bit on how Gephi can be used as the simplest solution to visualize your network. If you're good in JavaScript - the source generated by Gephi would've given you some idea on how to expand it further to have more selection panels and filtering capabilities. Unfortunately for yours truly - I suck at it.  Of course while I could spend some of my precious time learning a new language, I guess I could also make use of some ready-made tools available in the market. I chose Qlikview in this case. Mostly because it has a free personal license (who doesn't like free stuff right?)  Qlikview Traditionally - Qlikview doesn't support network graph visualization in any of it's out-of-the-box widgets. However, the beauty/strength of Qlikview in my opinion is it's support for extended plugins - which they call extensions. Using JavaScript, one has the ability to create any sor

Building a Data Product

" Do you have differentiation or competitive advantage?  Ultimately, anyone can create analytics on commodity data. Analytics, unless they are really fantastic, don’t differentiate data products. What does differentiate is proprietary data. If you have data no one else has, and you can create useful analytics on it, you’ve got the key to long-term competitive success with data products."  Some excerpts from a nice Wall Street Journal article. The articles lists down a few important questions that needs to be answered before you want to even think of venturing yourself in the data products business. Full article available here  .

Information is Beautiful

Image
Taking a break from writing for today. However I did come across a few neat blogs on data science and data visualization. Of particular interest is Information is Beautiful . While they're not really doing a dashboard (only a few) or something interactive per se (most of the stuff in the blog are infographics), but I do like to see visually informative stuff from time to time and it's quite easy on the eyes. But just because it's not a dashboard or interactive does not mean that it can't be done! Sometimes all that people need are just some inspiration and off they go.

Visualizing Social Network - Part 1

Image
An idea is worthless unless implemented right? You can have all the data in the world but if you can't articulate it or visualize it for others to understand, then it's really just another data in your data warehouse. The last couple of months me and my colleagues were working on social network. Basically trying to understand how the subscribers are interconnected and identifying who are the influencers for targeted marketing - or so we thought that's how it should be. Business justification aside - it was an interesting topic to dive into. After a few days, we managed to come up with our edge and node list, and later ran a few centrality algorithms to measure each individual within that network. Specifically, we were measuring: Degree - The number of direct connections that one has. ie How many direct friends does he have? Closeness centrality - How close (by means of hop) is a person to each of the person in their network? Between-ness centrality -  Identifying w

Person Movement Prediction Using Hidden Markov Models

Humans typically act in a certain habitual pattern, however, they sometimes interrupt their behavior pattern and they sometimes completely change the pattern. Our aim is to relieve people of actions that are done habitually without determining a person’s action. The system should learn habits automatically and reverse assumptions if a habit changes. The predictor information should therefore be based on previous behavior patterns and applied to speculate on the future behavior of a person. You can get the complete article here .

Modelling the Ebola Outbreak using Wolfram

The recent outbreak of the Ebola virus disease (EVD) has shown how quickly diseases can spread in human populations. This threat is, of course, not limited to EVD; there are many pathogens, such as various types of influenza (H5N1, H7N9, etc.) with the potential to cause a pandemic. Therefore, mathematical modeling of the transmission pathways becomes ever more important. Health officials need to make decisions as to how to counter the threat. There are a large number of scientific publications on the subject, such as the recent Science publication by Dirk Brockmann, which is available here. Professor Brockmann also produced videos to illustrate the research, which can be found on YouTube (video1, video2, video3). It would be interesting to reproduce some of the results from that paper and generally explore the subject with Mathematica. Full article here:  Modeling a Pandemic like Ebola with the Wolfram Language

Data extraction - Build your own phone using Raspberry Pi

Image
This is not an original post per se - just a repost and a reminder to self of sorts about how I could perhaps someday build my own phone and collect a whole bunch of other stuff that most people wouldn't dream of collecting (but of course they have already right? :) ) The article:  Lifehacker - Build your own phone The video, by David Hunt

Google Places API - Part 2

A follow up from my previous post . After some digging and experimentation, was finally able to come up with a way to mass search coordinates of a specific location type (i.e shopping malls across KL). Basically there are a couple of services that you could use, depending on the output that you want. More on that here . So based on the guide, I know that what I need would probably be more or less covered with Radar and Nearby - since I basically need the coordinates and on a mass scale. In essence (since the full description is available on their website and I'd rather not overcomplicate things) the Radar Search service can give you up to 200 search results at a go, given a coordinate to start with. Sounds good - yes. So I gave the example a go. But they can only give you the coordinates in the result list - without the name of the place. You also get a place id for each those coordinate, but I somehow wasn't able to use that id and retrieve the place's name - so I sup

Google Places API

Image
For the past few days I've been trying to gather coordinates of places around KL, in the attempt to tag those places to our cell towers for a more in depth market analysis of our consumers. While one could in a sense get those coordinate manually from Google Maps, and jot down the coordinates on a spreadsheet - it seemed like "not-so-smart" solution and troublesome to do in the long run (ie. your boss asks you to find the number of customers that's visiting a particular shopping mall today, and tomorrow he wants to know the numbers of customers that visits golf courses in the outskirt of town. In such case jotting down the coordinates manually would be very time consuming  - not to mention crazy) Initial attempts include web scraping and looking into the source code of the map (more on that topic later). Somehow rather those didn't work well as expected. Hence now I turn to Google API, or more specifically - Places. There are actually a lot of stuff that

Journey in Data Science

Image
Working in a team of data science enthusiast can have it's benefits. In our team of a few, none of us can claim we're good in data science and/or big data analytics as we're pretty much new in this field. Data scientists are hard to come by, and most articles out there can attest to that. Accenture in this suggested a nice idea in their article " The Team Solution to the Data Scientist Shortage ", if a data scientist person is hard to find - why not have/build a team that has the necessary skills of a "data scientist"? Inspired by this, I've set myself a goal to at least master some of the necessary skills that make up a data science guy. 8 skills of a Data Scientist, from the Accenture article mentioned above. Which makes sense really. To be a master of all the above mentioned area would consume an insane amount of time. Thus to be able to segregate the task around and focus on achieving the end goal - together - as a team; would mean

Dual in Netezza

When you're used to doing queries in Oracle SQL it can be confusing at times when you start working in Netezza. While most of the time the syntax is the same, some functions that you're used to using are just not there anymore. One of them - is dual. Luckily a quick google on the topic points you straight to the answer. - "_v_dual". Ie: select sysdate from _v_dual; Another interesting thing that I recently discovered while working in Netezza is the existence (I'm a really really new user to Netezza) of a spatial module that contains a lot of geo-related functions. Can't wait to try those out. Will share any stuff that I find soon.

Models

Image
When we talk about big data analytics, the usual topic of discussion would normally revolve around Hadoop, Pig, MapReduce, NoSQL - platforms basically. Granted - those technologies are the enablers of big data, for without which - one can't store big data in their data warehouse. For now though, I'd like to focus on the math. The models to be precise. For without which, one can't derive any usable use cases anyhow with all those data that you have (you might be able to get those low hanging fruits, but over the years - you will have to put on your math hat as well). Below is a link for a course in Coursera which I find to be perfect for beginners like myself who's rather new with statistics and the different models that can be used to solve different questions. The reason I'm mentioning this course in particular is because I like the pace at which it is going and how the presenter is able to articulate complex ideas in simple words. https://class.courser

Unable to drag and drop icon from Desktop

Had this problem recently where I don't seem to be able to drag and drop any of the icons that I have on my desktop. I could copy and paste them though - so there is that workaround. But still - to have to do that for everything that I want copied/moved over to some other folder is really annoying. Googled the problem, and I believe the consensus in the below link gives the best answer. http://www.sevenforums.com/general-discussion/15038-i-can-t-drag-drop.html Basically what I did to solve my issue was just to press the 'ESC' key a few times. More technical details on why it worked is in the link above (go read them there! I'm not going to simplify things for you :P ).

Powerpoint 2010 slows when changing font

Had this problem recently when creating some slides for work. It seems that whenever I tried to change the font or resize my text, the program slows/lags/hangs for awhile before making the changes and allowing me to do anything else. After some googling, I found that others have been complaining about Powerpoint 2010's performance as well, but it seems for them the program slows down whenever they tried to type stuff in. Among the workaround suggested are: 1. Hide background picture. 2. Enlarge the preview slide that is on the left-hand side. 3. Turn off live preview. 4. Tune some printer setting (maybe the program is trying to find some printer or something, so turn that off) Cutting things short - the third one did the trick for me.