Infinite Scrolling in Android

Infinite scrolling has become very popular in recent years. It’s become especially popular on mobile devices for the simple fact that it allows you to fetch new data while accessing data that’s already been fetched. The concept is pretty simple – once the end of the list is detected to be near, a call is made to fetch more data which is then appended to the end of the list. This post goes through my implementation, contained here. Continue reading

Data Join Techniques in JavaScript

When I decided to cut over my festival guide to a standalone site, I needed to figure out a way to do the data operations on the client’s browser that I was doing on the MySQL server. Specifically, I needed to emulate joining tables. The nested loop join is the simplest join you can do in SQL, but it happens to be the most costly. The merge join is a much faster method that can also be easily emulated in JavaScript, but is only advantageous if the sets are sorted (which can offset any performance gain over the nested loop join). I wanted to find out if any of the popular JavaScript data manipulation libraries offer an advantage over using standard JavaScript operations. Continue reading

Text Deduplication in SQL

Data deduplication is essential when importing similar data from different sources. Different providers store data differently, and several variations (both correct and incorrect) exist in the English language for names of people, companies, and entities in general. Deduplication is often made easier if there is a lot of other information associated with the data because it gives you several things to compare to identify a dupe (such as birthday for people, location for company, etc.). When you’re trying to identify duplicate names only, things get a bit tricky. Continue reading

Mapping the Snapchat Data Leak

I’ve been following the Snapchat data leak pretty closely the past few weeks, from the announced weakness to the actual leak of the phone numbers. What I found most interesting about this in particular was that instead of email addresses, password hashes , or credit cards, the leaked data was geographical, mappable data. Continue reading

Hashing With SQL Server CLR

I have been looking at using hashes in a computed column to determine equality among rows, rather than compare each column. While running some tests, I encountered a limitation with SQL Server’s HASHBYTES function: the input can only be 8000 bytes or smaller. This won’t work for our purposes, as some of our tables have NVARCHAR(MAX) columns whose maximum length exceeds 8000 bytes. One solution I’m looking into is using a CLR. Continue reading

Moving Files in Android

Android users have the luxury of being able to access the storage on their devices. Because of this, many prefer to maintain their own directory structure and organization pattern for their files. A frequently requested feature for Vibe Vault is the ability to change the directory to which we download files. Many apps unfortunately don’t allow users to change where files are stored, and I suspect it’s either because developers hard-code path variables, or they don’t want to risk moving around existing files and directories. When it comes to moving files, there are two ways to do so: copying and deleting, or simply renaming. Continue reading

Moving a File In The Android MediaStore

When you move or copy a media file in Android, any app that access it via the MediaStore will not automatically be updated. In one of my previous articles I talked about querying and manipulating Android’s MediaStore, and the cost of doing a full scan of the file system to check for changes. Luckily, there’s an easier way to update the file’s location: use ContentResolver’s update() method. Continue reading

Pagination Syntax in SQL Server 2012

When we started upgrading our SQL Server 2008 instances to 2012, I went back and reviewed the new development features that were added. There were many that I was excited to try out and find what performance and readability improvements I would experience. Among those was the pagination enhancements to the ORDER BY clause which allow you to specify an offset and number of rows you want. MySQL developers have long used the convenient LIMIT statement, but SQL Server developers have had to use subqueries or CTEs with ranking functions to achieve the same effect.

Before I start using the new syntax in my queries, I wanted to investigate the performance differences between the two methods. Not only to justify switching over to the new way of doing things, but to see what’s happening under the hood. Continue reading