Fake user identification

You know it occurs that in this age of social profiles and freely available public records it oughta be possible to automatically identify fake users with a fairly high degree of accuracy. Seems like it could go a long way to catching people who are scamming user reviews. Although, it probably wouldn’t do anything about the “reviews for purchase” problem in which people are paid to use their real identity to do reviews. Seems like community policing is probably still the best bet there.

http://www.forbes.com/sites/suwcharmananderson/2012/08/28/fake-reviews-amazons-rotten-core/

http://www.nytimes.com/2012/08/26/business/book-reviewers-for-hire-meet-a-demand-for-online-raves.html

Cellular EA

One of the big problems I’ve experienced with doing pipeline optimizations in a cluster is that it is not computationally efficient using a traditional genetic algorithm. The reason is that the entire cluster must wait while the longest running evaluations for a generation complete, which may take longer than 99% of the other individuals in that generation. The result is that a cluster might be 95% idle for significant periods of time.

Cellular EAs are a type of evolutionary algorithm that do not block. Mating occurs on a local neighborhood in an asynchronous fashion, which allows for full usage of cluster resources. I’ve added an initial grid based cEA to Inspyred and have added support for this algorithm to the mnist sample in pyvotune.

cEA fork of inspyred @ https://github.com/aelaguiz/inspyred

PyvoTune @ https://github.com/aelaguiz/pyvotune

Translated theano logistic sgd

I’ve started translating the theano deeplearning tutorials from theano/python to python/numpy. This was mostly to validate my understanding but also help me learn theano.

I’m going to continue translating these, hopefully it is useful to someone.

Translated from: http://deeplearning.net/tutorial/logreg.html

Code:https://github.com/aelaguiz/DeepLearningTutorials

I’ve got some simple benchmarks too. Theano is fast!

My version:

Optimization complete with best validation score of 7.479167 %,with test performance 7.489583 %

The code run for 75 epochs, with 1.674691 epochs/sec

The code for file logistic_sgd.py ran for 44.8s

real 0m29.138s

user 0m45.541s

sys 0m2.177s

Theano version:

Optimization complete with best validation score of 7.479167 %,with test performance 7.489583 %

The code run for 75 epochs, with 6.469746 epochs/sec

The code for file logistic_sgd.py ran for 11.6s

real 0m9.328s

user 0m14.780s

sys 0m1.116s