There are a whole lot of options out there today for ingesting PDFs and manipulating them programmatically but they are definitely not all equal. With countless open source solutions and a handful of commercial solutions, I was inundated with choices. Too many choices.
At a high level, my requirements were simple: Ingest a PDF, convert it to a format that is easily viewable on desktop and mobile platforms, and extract meaningful text and data from it. As it turns out, there are 3rd party tools and open source code to do each of those things, but none that could do ALL of those things in an elegant manner.
Without getting into the details, the PDFs we're processing are quite large and complicated and our experience with hosted conversion solutions proved to be the wrong solution. I investigated iTextSharp because of its reputation in the Java world but it simply didn't solve my problem elegantly. There is a decent community around the product, but as open source goes, there is no support and no person to get in touch with when things go wrong. And they will.
Although parsing PDFs is important to us and our customers, we're a small startup and we knew that it wasn't our core competency. We quickly realized that we should allow someone smarter than us to worry about understanding the PDF spec and spitting out all the things we needed.
Some research turned up Aspose as a possible solution. It was really easy to get started since all of their products are also available as Nuget packages. A few install-package commands later, and we were processing PDFs. The documentation provides loads of examples for dealing with attachments, fonts, pages, TOCs, forms, links, tables, etc.
We focused mainly on conversions during our trial and were able to successfully convert PDFs to SVGs while retaining the original documents quality. Additionally, we're now able to extract information from the PDFs that we can use to better understand that types of documents our customers are creating and submitting to us. We're now beginning to experiment with using Aspose.PDFs ability to create PDFs so that we can create highly customized reports for both internal usage and for customers.
So far, we're very satisfied with the quality of the results we've seen so if you are looking for a full featured PDF library for Java or .NET, check out Aspose.PDF and if you are using some other language, they've got a Cloud API to perform the same actions.
At FieldLens, we run a very complex (read: Enterprise) Single Page Application on Rails hosted by Heroku. The application is 99% CoffeeScript and 1% Ruby which means that we’ve got a ton of client side scripts that need to be precompiled by Sprockets.
Heroku is nice enough to run assets:precompile for us when we push code to our Heroku hosted Git repository. For the past year, that has worked wonderfully but with our latest major release, we added a few new features and a bunch of new code to power them.
Suddenly our precompilation time has went from a couple of minutes to over the 15 minute limit set by Heroku, causing our deployments to fail. After rolling back many revisions of our code and still being unable to get Heroku to compile our application, we started looking for other solutions.
We were pointed in the direction of the turbo-sprockets-rails3 gem and the related Heroku buildpack with the promise of compiling assets much faster due to the ability to only compile files that have changed since last deployment.
It also has a few other tricks up its sleeves, such as only running the compile phase once instead of the default twice.
After switching the buildpack to use turbo-sprockets, asset compiling in our first deployment was down to 2 minutes. That was amazing for us, as we were able to deploy even faster than we used to before something broke.
The power of turbo-sprockets is really obvious after the first deployment, however. On consequent deployments, asset compiling took in the rage of 20 to 40 seconds. Wow.
Note: The strange thing about our problem was that even when we rolled back our code to versions that were known to deploy “normally,” they continued to time out. At the time of this writing, Heroku support has not been able to explain this but apparently has engineers looking into it. In the meantime, our problems are solved and for all intents and purposes, we’re better off now.
My team and I have been learning about AngularJS and how we would use it as a replacement for our Backbone.js based application. Here is a list of resources that we've found to be helpful in learning about AngularJS:
- http://www.egghead.io/ - John Lindquist has created over 40 videos around 3-4 minutes each showing various aspects of building an AngularJS application, including Filters, Directives, and different usages of Angular's $scope. Highly recommended and considered by many to be the best introduction to AngularJS on the web.
- https://www.youtube.com/watch?v=i9MHigUZKEM - Dan Whalin takes you through the fundamentals of building Angular applications in this 70 minute video. Lots of information here in a single video. It may be a lot to digest, but it is spread out over 70 minutes. I prefer Lindquist's shorter videos but this is still another great video to watch.
- http://docs.angularjs.org/guide/concepts - Straight from the AnguarJS docs, this (very long) page explains all of the concepts of an AngularJS application, including scopes, controllers, models, views, directives and more. If you are going to build an Angular application, you absolutely must read this page first.
- http://www.yearofmoo.com/tags/AngularJS.html - The Year of Moo site has a handful of great articles about AngularJS. My favorites are Use AngularJS to Power Your Web Application and More AngularJS Magic to Supercharge your Webapp.
is hardisn't easy. If you plan to write a large application, this blog post will give you a few good ideas about how to best structure your application for long term maintenance.
- http://onehungrymind.com/notes-on-angularjs-scope-life-cycle/ and http://jimhoskins.com/2012/12/17/angularjs-and-apply.html - These posts both attempt to explain one of the most complicated aspects of AnguarJS - Scope. The authors explain some of the functions associated with $scope and when to call $apply.
- http://blog.artlogic.com/2013/03/06/angularjs-for-jquery-developers/ and http://stackoverflow.com/questions/14994391/how-do-i-think-in-angularjs-if-i-have-a-jquery-background - If you are a web developer, you have undoubtedly used jQuery. If you are looking to start using AngularJS, you need to start thinking differently about how you manipulate your DOM. Angular has some great support for this but if you try to build things the jQuery way, you'll quickly lose the benefits provided by AngularJS.
- http://deansofer.com/posts/view/14/AngularJs-Tips-and-Tricks-UPDATED - True to its title.. this is chock full of tips and tricks as well as code to go with it. Great resource to refer back to once you get your app up and running.
- http://blog.angularjs.org/2012/07/introducing-angularjs-batarang.html - This is an awesome Chrome extension that gives you visibility into your application. It provides performance information as well as the ability to drill into the Scope of various controllers and directives on your page simply by pointing and clicking. A must have for anyone building an Angular application.
- https://groups.google.com/forum/?fromgroups#!forum/angular - This is the official Angular forum hosted on Google Groups. Great place to get into discussion with other developers as well as core developers.
- https://plus.google.com/+AngularJS/posts - This is the official Angular Google Plus page where people often post articles and links to Angular related news
- http://www.cheatography.com/proloser/cheat-sheets/angularjs/ - This is a pretty cool cheat sheet for AngularJS that lists out all of the built in filters, directives and angular.* methods, their syntax, and what they do. Its a great resource for anyone getting started who may not know all of the tools they have at their disposal.
- https://github.com/angular/angular.js/wiki/JsFiddle-Examples - This is a list of JsFiddles for some common things you might want to do in AngularJS. The cool thing about being hosted as a JsFiddle is that you can change the code and re-run it to see what your changes are doing without needing to setup a whole Angular app on your own machine. Also, the Fiddles serve as code samples of functionality that you could copy and paste into your own app.
When we began building FieldLens in early 2012, the natural choice for our single page application was Backbone.js and Underscore. Backbone was great because it did not force us down any particular path but still gave us a nice foundation for building our application. This was especially important for us as we had not completely figured out how our application would do the things we envisioned it doing.
Unfortunately this was before libraries like Marionette and Thorax were available & mature, so it was up to our small team to figure out how best to develop a very large maintainable and performant enterprise application.
Building a framework ourselves enabled us to build something streamlined and specialized to do the things we need to do. It is not something we could ever extract into an open source project, unfortunately. This is exactly what makes Backbone awesome. It is also the same thing that makes Backbone a problem.
Sometimes you need structure, and Backbone.js does not give it to you. It gives you a few guidelines to follow but does not enforce them. In fact, the only structure that Backbone promotes comes in the form of Backbone.Model. Data models are the bread and butter of any web application so its great that Backbone.js has built in support for this. However, the structure of Backbone's models simply did not work for our application.
Backbone Models assume that your models will be structured the way Rails typically presents data models. This makes sense since Backbone.js was extracted from a Rails application but we're not a Rails application. We have a Java\Spring based RESTful API that serves up small pieces of data per API call.
What did we do?
Our solution was to build our own FieldLens.Model class that enabled us to build models that were comprised of multiple HTTP requests and responses. Initially we inherited from Backbone.Model but quickly realized that we were overwriting almost every method. So we decided to ditch Backbone.Model altogether. Additionally, the structure of our data did not work well with Backbone.Collection either. We were left basically only using the View and Router functionality provided by Backbone.
Maintaining Views is, in my opinion, the most difficult part of a Backbone application because you must make sure that you properly dispose of your views, detach events, dereference objects, etc or you absolutely will run into memory leaks. Today, there are a number of libraries that sit on top of Backbone and abstract away these problems, including our own. We built a sophisticated FieldLens.View class that all of our views inherit from. It does all of the aforementioned things and more.
So the end result of over a year's worth of work is that we use a fraction of Backbone's functionality & spent a fair amount of that time building our own framework on top of Backbone to abstract away inefficiencies. And we still have little structure.
Where do we go from here?
Now that our V1 is off the ground, we're all thinking about the next version of our app. That includes a brand new look and feel, a responsive approach so that our app works nicely on a wide range of devices, and a better framework for building V2. Now, we know what our application does and how it should work and we've learned a lot over the past year.
We've begun evaluating our options, including Ember, AngularJS, and Backbone.js augmented by Marionette or Thorax. So far, we're loving AngularJS. It provides great structure in the form of Controllers, Directives, Services, etc. It provides structure around clear separation of concerns. It is well tested (i.e. comes with a test suite for all internal code) as well as promotes building applications in a testable way, so that we can ensure that the code we are writing does what we think it does.
Contrast with Backbone.js which does not provide structure around what code goes where, no separation of logic, templating, ui interaction, etc and does not promote any testability - these are all up to the developer to figure out and do "the right way."
AngularJS is a well thought out application framework. It is opinionated. That may turn some people off, but it is built this way to promote best practices that you can build a maintainable and performant application from the ground up, without needing to worry about how to do those things yourself. It gives you the opportunity to worry about building an awesome application, not an application framework.
By default, IIS7 has built in support for various webfont MIME types so you might not notice any problems at first.. but once you try to serve these files over a CDN such as Akamai or CloudFront (like I was) you’ll quickly notice that your web fonts do not load properly in IE or FireFox. The reason is that these browsers (and maybe others) follow the Cross Origin Resource Sharing spec that prevents a page hosted on one domain from accessing resources hosted on another domain.
In other words, if your page is hosted at http://www.mydomain.com and your CDN provides a hostname such as http://mycdn.somecdnprovider.com, there is a good chance that you will not be able to access your web fonts correctly.
The solution is easy: Add a header to your resources that informs the browser of allowed domains using the Access-Control-Allow-Origin header.
If you are hosting an ASP.NET application in IIS7+ you can easily add this header by modifying your web.config file as follows:
You’ll also need to have the URL Rewrite module installed.
Block strings and Multi line strings
By default, the CoffeeScript compiler wraps all of your code in a self executing function. The beauty of this is that it prevents developers from accidentally polluting the global namespace. This can be disabled in the compiler - but why would you?
Compile type safety
Scott Hanselman has written a great post detailing how to include the MiniProfiler in your ASP.NET application. I tried the profiler out and it works great. Its lightweight and easy to configure but by default it does not give you very much information about your requests and method calls.
Admittedly, it is a ”mini” profiler so you cannot expect great things from it, such as memory usage like you would get from a full featured out-of-process profiler – but this thing can go pretty far.
Unfortunately, I did not want to add using() statements all over my code in every method I want to profile. Being able to mark methods with attributes would be great, and ASP.NET MVC lets you do just that which makes it easy to mark your base controller with an attribute that allows you to profile actions. This does not, however, give you any information about the methods being called from within the action.
C# has no built in way to intercept method invocation so the only way to accomplish this is to use some sort of Aspect Oriented Programming (AOP) library\utility. There are a few out there, including:
- Castle Windsor Dynamic Proxy
- and a few other smaller players
I’ve decided to use PostSharp Community Edition. It is a free, commercially backed product, and it works pretty well. With PostSharp, you can define attributes (or aspects) that allow you to run code before or after a method is invoked and catch & handle exceptions. IT does this by post-processing the IL that is generated by the C# compiler and modifies the areas of the code necessary to make your code work as you intended.
It does add an additional process to the compilation but if you need this sort of power, its worth it.
Here is the code for using PostSharp and MiniProfiler together
And wherever you want to profile, you can mark either the entire class or a specific methods with [ProfilingAspect] and now you will see those method calls showing up in your MiniProfiler logs.
Pretty easy, isnt it?
I recently had to write some code that reached out to both Google and Bing, performed searches on both, combined and manipulated the results and return the result to the client via JSON.
My problem was that sometimes either of those two services could be slow and there is a possibility that I may need to add additional data sources at a later time. The code originally looked like this:
As you can see, this code would accomplish the goal without a hitch, however if Google took longer than usual, Bing would have to wait for Google to finish. This just creates a bad experience for my users. I needed a way to be able to fire off both of those searches at the same time without blocking one another, however to be of any use to my users, the action would have to wait for both of them to complete before combining the results and return them to the client. This problem would be amplified by adding additional data services (like Yahoo, for example).
Even though I had done this several times in PHP, I had no idea how to accomplish it in C#. I know that AsyncControllers exist and it sounds like they rock, but what if I wanted to keep this all in the same controller as other related code?
This is what I decided on, and it works great:
You can create as many Task’s as you need, and then start each of them with the Start() method. Then using the Task.WaitAll() method, you instruct your program to wait until all of those Task’s have completed before continuing to the next line of code.
You can also specify a timeout as the second parameter of Task.WaitAll that will prevent your Task’s from running indefinitely, which is definitely helpful for Ajax requests when the client is waiting on some response.
As I’m sure many people have experienced, I receive a handful of recruiter emails every week. I categorize these messages into two groups. The first group is very untargetted – promoting positions for things which I am not qualified for. Take a look at my resume, my blog, my twitter stream, and\or my LinkedIn profile for a minute and you’ll see what I am proficient in and capable of. Did I say that I was a ColdFusion developer? No, I didn’t. So don’t waste my time (or yours) offering me a Senior ColdFusion Engineer position.
Then there is the second group of messages I get. Many of these unsolicited messages say nothing more than "Hey I have a great position to tell you about. Lets get on the phone and talk about it".”
Again, No. I don’t want to get on the phone with you. I don’t want to hear what you have to say. I don’t know you and I don’t trust you. Why? Because its most likely that you don’t know what you are talking about. You are a recruiter, not an engineer so please don’t try to pretend that you know anything about writing code or solving difficult problems. I don’t want to waste my time hearing you pitch some startup to me – or pitch what you consider to be a “startup” either.
Instead, tell me who you are recruiting for. I know startups and all I need from you is to tell me who the company is. I’ll decide if they are really a “startup” with the startup culture, mentality and environment. I don’t need you to try to convince me of it over the phone. I can do my own research.
Don’t worry! I’m not going to go around you. You’ll get your commission but if you want to work with me, it wont be all on your terms.
“Scalable” is an interesting word that gest thrown around all too much in the development\information technology world. I recently had
an argument a discussion with a co-worker about the possibility of building a brand new set of services on top of our old, rotting code base. My opinion is that we should take a step back and take a serious look at the current code base. I don’t believe that it is a good idea to build an important new service (or set of services) on top of the current code because it was “architected” by several out-sourced junior developers 5 years ago.
The code is… garbage. Some of the recent stuff is good and usable because it was developed by seasoned engineers. The older stuff is so poorly written that every time we are required by our product team to add a new feature or enhance an existing one, the time it takes to get the product out the door is months when it should be weeks and weeks when it should be days.
The co-worker in question continually argued that the current application has scaled quite a bit over the past 5 or 6 years and is fully equipped to scale further. He argued that over the lifetime of the codebase, we have added so many new features. The thing that kept jumping out at me was his use of the word “scalable” to describe our applications ability to handle more traffic, and nothing more.
This made me think – is the application the only thing that needs to be scalable? I think the answer is unequivocally “No.”
In our case, we can scale the application to handle more requests but we cannot scale the development team. That is our problem – scaling the development of our software is impossible. Like hardware scaling, development scaling has two options and for the sake of simplicity we’ll call them by the same names.
Vertical scaling: Hire more developers. Its easy to find low quality people to work on low quality code, but they still cost a decent amount of money. Single point of failure because since the code is so complicated, its more difficult (if not impossible) to find people who can quickly learn the old code to a point that they can build something on top of it. In my experience, few developers were willing to work in such a disaster of code. We hired a number of developers who quit after just a couple of months. Its not fun or interesting to be in “bug fix” mode forever.
Horizontal scaling: Use the engineers\architects\developers you already employ and take the time to fix the code (or write new code). This is harder and cant always be done (in our case, it’s a viable option). Its harder also because you would most likely need to consider all the things your software does now and recreate that experience. You eliminate the points of failure noted above because, if done right (and hopefully it is, since that’s why you're doing it in the first place), you’ll be able to hire any eligible developer and they should be able to pick up the new, modern, well architected code much quicker than above.
The ability to re-build from scratch is rarely an option in the software development world but when it is, you should seriously consider it. When you hit that point where you can no longer scale your development to a level at which you can easily and quickly iterate on your application or build new features, you should think about your options and find out if re building makes sense. It might not and you might be stuck working with unscalable software, even if it does handle the load and generate cash flow.