Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Give me back my sanity: How I had to install Pandoc in a CentOS Docker container (gist.github.com)
109 points by grownseed on Nov 17, 2014 | hide | past | favorite | 63 comments


Congratulations! You are now closer to being Enlightened.

You now have a closer understanding pf the hundreds of thousands of man-hours that distribution maintainers spend doing this every single day just to give you a working application. Keep in mind of course that all of which you performed will not replicate properly, because it doesn't use a spec file, doesn't record its build deps, run deps, permissions, has no uninstall or upgrade, doesn't use standard paths, probably skips a whole bunch of correct procedure and bug fixes, uses some real-time dynamic package managing/installing thingie, yadda yadda....

It's not that software development procedures are broken; actually, they're really, really well defined for most of the Open Source world. It's just that most humans have no clue how much work goes into giving them some 'basic' products. Kind of like how most people have no idea how much of a multinational effort goes into producing a bottle of Coke.

Could said tool be written with no dependencies? Sure. But that would be a total nightmare in and of itself. At the end of the day, code reuse and layers of abstraction are useful... as long as they're designed and used properly.


The funny thing is that given maintainers do this all the time so they'll work faster, and he's apparently figuring it all out the hard way the first time, by the time he's done the maintainers have likely already uploaded. Probably. Or at least they could have.


It's usually not that simple. If you already have passed the learning curve of knowing intimately how to maintain packages for your distro, it's a "simple" matter of updating the old .spec files with the newer versions and sending the distro maintainers your newer .spec. But there's no guarantee they'll incorporate your new package in a timely manner (and it will go into an -UNSTABLE tree anyway, which his Centos 6 install isn't using)


He is also studiously ignoring the work of the RPM packagers, which is making everything worse.


This is just a case study in why using CentOS for anything is like amputating your left leg - if you're going to do it, you better have a damned good reason, and no matter how justified you are, it's still going to suck.

But I don't disagree with the author's pain point. Computing is sort of way harder than it needs to be a lot of the time.


CentOS is great if you're looking for proven stability. But yeah I've had so many moments of hair-pulling frustration dealing with incredibly outdated CentOS packages (All of them?).


Yeah I'm not sure I understand the choice of OS here unless it was required for some reason.


I have CentOS 7 installed on my desktop computer!

There are dozens of us! Dozens! But I honestly don't have any major complaints.

1. Packages aren't THAT out of date (firefox is the latest ESR, gcc is 4.8)

2. Things tend to be nice and stable

3. I sometimes end up needing to compile some stuff from source even on faster moving distros anyway


That is because Centos 7 was just released. Let's talk again four years from now.


Yeah, every time I ssh into a box and see "terminal type not supported" (screen-256color), I know its using CentOS 5.x.

Which means ssh doesn't have netcat mode, rsync can't write to log files or show progress properly, and all my tmux bindings are messed up.


The thought process goes:

Centos -> RHEL -> "Security"

In the same way that Oracle is "Enterprise."


Ha, I see. And then muck up the "security" of your platform by putting a bunch of build tools and such on it, so you can do the thing you're actually trying to do. Rad.


CentOS is not that bad, and RPM specs are way, way better than kludgy Debian packages. And anyway, if you run a business you will probably require RHEL or CentOS to get any vendor support.


Sometimes, the easiest thing is to just compile a binary elsewhere, and drop it into the system. Especially when you are dealing with CentOS (RHEL, Scientific Linux, ...).

"Linux"* is usually forward-compatible, meaning that a binary compiled with an old system will run on a newer one. Maybe you have to bring all your own libraries with you, and place them on LD_LIBRARY_PATH, but you can get it to work. I can't count the times I had to extract .so files from RPMs I found on the net. Feels a bit dirty, but surprisingly works.

Actually, at one point I compiled my own glibc (simultaneously with GCC), and then the full GTK stack. I played a bit with RPATH so the binaries looked for libraries at relative, not fixed paths. Then I could just drop the whole usr tree somewhere, source a script (setting a few variables) and I was able to run current software, including current versions of Skype, Firefox, and LibreOffice on my ancient OS at work. There are binaries for the latter two, there is no way I would compile those, too.

Its a bit unfortunate that there is no "culture" of using binaries when necessary on Linux. If people were more aware of the use cases and details, the experience would be a lot smoother. For example, creators of binaries could compile against older system libraries to get maximum compatibility. Or, you could have bundles of libraries (distribution-agnostic, and not installed in /usr) to run new apps on an old OS. Ironically, Windows has the distribution of third-party binaries figured out much better... although I do not envy Windows for the installation situation there, and in 90% of cases prefer the package managers of the Linux world.

(* And here, I mean Linux distributions, not just the kernel. So Linux+glibc+GNU+bash+X11+related stuff, but maybe a few of those are swapped out for alternatives. Just a reasonably GNU/Linux-Like Posixy System, but not Cygwin or OS X.)

(* * Which I was stuck with for reasons out of my control.)


> The reason that there is no "culture" of using binaries when necessary

I don't understand what you mean.

Outside of Gentoo, most distros are based on binaries - I don't need to install GCC in order to install programs via 'apt-get' or 'yum install', nor do I need GCC in order to run them.

--

Assuming you mean 'culture of not using the built-in package manager', here's what I think:

While I've definitely done "scp a.out $other_system:" that only works for the most trivial of programs. After you accumulate a few more files, it becomes "scp foo.tgz $other_system:" because there're some library files/whatever to be included.

But then it turns out some of those files need to get put in the right place, so the next step is a rudimentary runme.sh script that sets up the system.

Hang on, what's the last version we 'installed'? Just have runme.sh stick version info in /etc/foo.conf.

Congratulations, you've got a rudimentary packaging system!

The reason there's no culture of avoiding the package manager is because it's there to help you so you don't have to manually hunt down rpm files to extract .so files, among other things. (If I never have to use rpmfind.net again it'll be too soon.)


Have a look at this:

http://ianmiell.github.io/shutit/

which is a means of capturing this complexity for automation purposes.

https://github.com/ianmiell/shutit

I'm working on a "shutitdist" to compile "everything" from source:

https://github.com/ianmiell/shutitdist

ShutIt can (among many other things) produce depgraphs like this:

https://raw.githubusercontent.com/ianmiell/shutitdist/master...


Thank you for documenting this so others don't have to go through the pain. The best thanks people could give would be to file bugs for each of your annoyances that are still relevant. Now that the problem is actually a bit more reproducable, the bugs should be relatively productive uses of the maintainers' time.


This about the impedance mismatch between building everything from source and using distro package management. Mixing and matching is painful, as package managers correlate a lot of package metadata in order to provide smooth integration and interdependencies of multiple packages.

Building everything from source "by hand" quickly runs into scaling problems, and admins start to roll their own "package databases" in order to cope. Fully-baked package management becomes the solution. The problem here is that distro packages are not the latest development snapshots (for good reasons, most often and presumably). Still, there are times where newer-than-distro-repo software is required.

The author's Pandoc case may be one of those, but I doubt it. RHEL (and CentOS) packages should never (by definition) include showstopper bugs. It's more likely that bugfixes have been backported to the older, stable version. Still, it's possible that the author is correct, and the official package does not meet requirements or is otherwise unusable.

In that case, the best path forward is to build or reuse an updated package, rather than installing from development sources "by hand", e.g. `make install`. It's likely that someone else has already had this problem and has built an updated package, which you could then install, but let's assume not.

Now we're into the wonderful world of package building. We can avoid some of the author's pain here, but mostly there is a tradeoff for having to learn and deal with the packaging pipeline. The upshot is that package building may be scripted and synched with development sources, and you are keeping the metadata inside the packaging system where it belongs and can be managed (think: future updates).


I can't help but compare this to OS X, where most applications are bundled together with any dependencies that you can't safely assume will exist because they aren't included in the OS.

Yes, it does waste some disk space. But the $5 worth of disk space that this approach costs me is more than justified by the $5,000 worth of gray hairs it saves me.

My only lament is that things still fall back to the annoying old-fashioned way of doing it whenever I need to install software that's more Unix than OS X. Which, given I'm a programmer, is still pretty much all of it.


Sure, and Windows is much the same wrt bundling. The tradeoff is that now you've multiplied the responsibility of library updates across everyone that bundled it. You probably won't care if simple library version updates are missed, if the app itself doesn't care. You might care more about bugs that require every app to update, especially if it's a security bug.


Sure there's a tradeoff, but for most people it's really rare to use software that isn't being actively maintained. So let the maintainers update their dependencies when there's a security flaw, the auto-update will pull in the changes, and I'll still benefit from OS X's 30-second install process.


The catkin and bloom tools developed for ROS (robot operating system) go a long way toward mitigating this pain— they make it much, much easier to maintain a source workspace of multiple federated packages, build them all together with dependency resolution and a single command invocation, and switch between the released (deb) and source version of a particular package.

Unfortunately, there isn't a good page which explains this all for the benefit of someone who doesn't care about ROS or robotics specifically, or who doesn't care about how catkin compares to the legacy system it replaced. But this is a start: http://wiki.ros.org/catkin/conceptual_overview


I think your understating how hard it is for a beginner to create a spec for an rpm. As a n00b I recently did this for sikuli on CentOS 6. The macros in the spec file gave me the most trouble. The close second was the build process until I realized that specs where designed for maintainers not developers.

Normally, someone has already written a spec that you can modify for your use case.

Given the number of docker projects (looking at you redis) that build from source, docker seems to be moving into package management.



The real problem is that all of the major package managers require root, and usually you can't set the install prefix to a different folder than '/' (like for example '~/.local/'), without using chroot (again root is needed there).

It is true that some packages DO actually require root (like obviously the kernel), but most really don't.

A package manager that works like this would be portable to any distro. I think Nix and Guix are supposed to do this.


CentOS package manager yum does have the --installroot option to change the installation root.

However it will still want to update the global rpmdb, and thus require root for that.


Just to reinforce this: RHEL/CentOS version numbers cannot be directly compared with the upstream projects version numbers. That is the version that was picked as a base, and then bug fixes are ported back to that version. In this case it is a single patch to deal with TeX support in CentOS 6, but for many other packages it can be dozens of patches to try to get a truly stable version.


I've been there many, many times and have often wanted to live on an island with no technology too. But to offer the opposite perspective ...

Yesterday I setup Docker and Fig to provide me with postgres and mysql containers so I could run the integration tests for a small sql oriented module I wrote. Other than waiting for the images to download, it was an absolute breeze. In a matter of moments and a simple config file, I get two databases at the drop of a hat, and they're not mucking with my main box at all. Sometimes, I really love software.


Setting up a private Docker registry though, especially with the (recommended) setup of an nginx proxy for basic auth and SSL... that's a complete nightmare. And I still can't get it to work =/


Maybe by design so you pay for Docker's private hosting? :)


Providing feedback in the right places can work wonders for both the individual developer and her community. For anyone struggling with CentOS Docker sanity I suggest reaching out to the team with any questions on a related Github repo. https://github.com/CentOS/sig-cloud-instance-images/issues/

I recently did such a thing https://github.com/CentOS/sig-cloud-instance-images/issues/2... In addition to a resolution, here's the caliber of (free, unpaid) support I received:

> ... let us know what other containers would be useful to you? ... Finding out what's actually useful to folks helps immensely.


2 things:

- distro developpers put a lot of work in making distros easy to use and thus why yum install pandoc is simple and easy. we often take this for granted.

- when people create new apps they very, very often think they can do better than others and suffer from the NIV/+1 standard. Hence multi libs, multi package managers and 203939megs of bloat. Programs that are proud of make install vs make + make install generally have this behavior obviously.

In our case id recommend testing with some recent version of Fedora. its pretty close to CentOS but much newer. Of course that means you'll have to go through distro update more often.


I had the exact same problem on an Ubuntu server at work. Since I don't have root access, I was forced to compile from source. I gave up when I couldn't compile Cabal.

My solution was Multimarkdown 4 [1]. It supports many basic Pandoc features I need (tables, etc.). Exports to HTML and PDF. Since it is written in C, the compilation was straight forward. 10 minutes later, my document is ready. Couldn't have been happier.

[1]: https://github.com/fletcher/MultiMarkdown-4


Wildly the best thing I've found so far on dependency management: http://nixos.org/docs/papers.html


Yep, I just switched from Ubuntu to NixOS over the weekend. Solves these problems wholeheartedly.


Wow, I just read through the documentation for the Nix package manager and it's a breath of fresh air...Source based builds but with the convenience of pre-compiled binaries hashed with all the input parameters used to create those binaries:

https://nixos.org/nix

Has anyone tried using this for development?


Yes, and it has been great. I recently had to diagnose a problem in some software and its dependencies, and I needed a quick way to try different build flags and dependency versions. Nix's deterministic builds (and having access to all of the packages definitions in one location) are a huge help here: it only took a couple seconds of configuring between each build, and I quickly isolated responsible library/version out of a couple dozen deps.


In cases where I need later versions of applications than what comes with my distro (Ubuntu 12.04 ATM), using the original source packages is very helpful as they already correctly list all the dpendencies and they will install perfectly into the existing system.

So you grab the source package, update the version of the upstream ounces, update the changelog and build an updated package, with as much original OS dependencies as possible.

I don't know about RPM sources, but for debs, this is what I had the most success with.


This. The dumbass in the article doesn't know about packaging and made his life needlessly difficult.


CentOS is seriously in need of a better way to find and install programs not in the base OS.

Currently, if you want to install something that isn't in the official repos, you either need to compile from source or download a bunch of rpm's from random websites that may or may not be legitimate. (What is this, Windows?) None of the third-party repos have a decent search function, either, so it's difficult to tell whether they have the version I need. Heck, even the official repo isn't quite searchable on the web, you need to use yum on the command line.

Example: Using only a web browser, find out the latest version of OpenSSL in CentOS 6.5 and the list of its dependencies.

Compare this with Ubuntu, where nearly everything you could ever want is organized into easily searchable PPAs, and looking up a package on the web is as easy as going to packages.ubuntu.com/package-name. No manual downloading of packages from random websites. No need to dig into mailing lists to find out which version of bash actually fixes shellshock.

This is not a matter of rpm vs deb, or yum vs apt. This is not a matter of stable vs cutting-edge. This is a matter of the general ecosystem being totally haphazard, fragmented, and user-unfriendly.


This was similar to my experience with Haskell and Ubuntu. The Haskell ecosystem seems to have both severe version compatibility issues and issues with transitive dependencies ballooning out of control.

I mean, this is a tool that processes text right? Not sure why there should be hundreds of megabytes of dependencies at any step.

Actually I was expecting a rant about Docker... it didn't really have anything to do with Docker, as far as I can tell.


Hum. As someone who's fooled around here... this is the long way around for pandoc.

The 'proper' way would be to get the latest Haskell Platform in a RPM (I think?), and then using that, install pandoc.

It's the case that anything I depend on for up to date software is, as a rule, compiled from source, using the latest tag. The distro maintainers are years out of date, always. Sometimes this matters... usually it doesn't.


I am not claiming this is going to be less painful than what author experienced but on Debian/Ubuntu - `apt-get build-dep` can be sometimes very useful for installing source dependencies of packages when you are compiling something more latest by hand (Such as, Compiling emacs 24.4 by hand? `apt-get build-dep emacs` and then compile emacs).

On another note - I like that additional ppas some more popular open source projects provide. Such as - nodejs. mongo etc. These def. help in installing _the latest_ version of required software. However maintaining ppas for lone developer project can be chore (although much of that can be automtated via launchpad).


In general if you're going to run CentOS or RHEL, you need to either be ready for operating within their versions of packages, or for a lot of pain. Unfortunately the tradeoff between "stability" and "new" is pretty harsh with these. It is my hope that Docker alleviates this somewhat, and I think it was really smart for the author to install this "big bag of source compiled craziness" under a container instead of polluting the host OS environment.


> I would like to extend my gratitude to Google, StackOverflow, GitHub issues but mostly, the people who make the beer I drink

This should really just be my email signature.


I've had very similar problems building in-house Openstack RPMs for CentOS 6 (now using Juno.)

You might have better luck in the future just grabbing the el7 stuff (or Fedora 20) and build them using the CentOS spec file:

http://pkgs.fedoraproject.org/cgit/pandoc.git/tree/?h=epel7


Somebody should contribute a Dockerfile to Pandoc upstream, so jgm can provide an official Pandoc Docker image (ideally as an automated build done by Docker Hub). Now, let the bikeshedding begin on which distro to use as the base. I vote Debian Wheezy, if it has a recent enough Haskell Platform.


On CentOS it can be hard to stay current because yum tends to use older builds. I haven't found an ideal solution for all cases; what I do is build current versions from source, in `/opt/$program/$version/`.

I also use static linking when possible; this helps enable the binary to be copyable to other similar servers. I also use Docker and/or Ansible and/or scripted installs when possible.

This overall approach does mean being committed to updating dependencies and rebuilding, which is especially important for security updates e.g. shellshock. It also means specifying environment paths so you get the versions you want, rather than the default CentOS yum versions. YMMV.


If the problem is that the version of pandoc in CentOS 6 is too old, then why not try to use a CentOS 7 container as a base? In the end aren't containers about choosing whichever OS image is better suited for a task?


Environment restrictions matter here. If you're running on Citrix Xen Server latest (v6.2 like we do and are heavily committed as a business to Xen Server) then you're stuck at CentOS 6.4 (or Ubuntu 12.04 and Debian 7.0).

Trying to run anything newer in any sort of mission critical role, and it screws up, will result in being told it's unsupported.

I'm keen on being able to support Docker but CentOS 6.5 is the earliest version that has out of the box support and CentOS 6.5 won't be supported by Citrix until the next XenServer release.

But, for us as a smallish hoster targeting businesses with managed services we value long term support and stability even if it means having to wait a bit for the fun stuff to arrive.


Isn't the Docker container all userspace with no kernel components? What is the difference between running a Docker container for pandoc with CentOS 7, or running a Docker container with CentOS 6.4 and manually built pandoc? Isn't pandoc equally unsupported in both cases? In fact as the OP mentions the pandoc in CentOS 6 was broken, so if it supported why wasn't a solution provided via the support channels?


> Isn't the Docker container all userspace with no kernel components?

Yes but you still need a minimum kernel version to run Docker (2.6.32-431 or higher according to [0]).

Earlier this year when we were exploring Docker, CentOS 6.4's latest kernel version didn't meet that requirement so Docker was a no-go. CentOS 6.5 was the min version with a supported kernel but our Citrix XenServer environment (6.2SP100) had no official vendor support for >=6.5. We don't run anything on our Citrix environments unless it's properly supported by them.

Now because of your comment and doing my fact checking, I just found out that CentOS 6.4's kernel is now at version 2.6.32-504 which is damn fine news for us. That said we'd still be stuck with CentOS 6.4 containers because CentOS 7 containers would still be unsupported by Citrix.

[0]: https://docs.docker.com/installation/centos/


7.8.3 is the current version of GHC.

I have no idea if these are compatible with CentOS or RHEL, but I have install instructions for Fedora in my guide: https://github.com/bitemyapp/learnhaskell#fedora-20 that were contributed by davidfetter on github.


If the folks behind Haskell want it to be used in an enterprise environment, I would think they'd want to release their own RPMs for recent version of RHEL/CentOS/SUSE.


This sounds easy compared to some of the hell I have burned in to get certain things configured like converting an EC2 PV AMI to HVM. Or maybe using oAuth in any incarnation.


If you're ok with using conda (python pkg mgmt tool) you can do conda install -c wakari pandoc, and it should just work


I had to go through similar machinations with getting the latest subversion on rhel 5. I feel your pain.


So it looks like he needed CentOS 7 and installed v6.

"Oh, I installed years old distribution and it's years old! OMG!"


The _real_ problem is that Pandoc is written in Haskell.


In what way is that a problem?

Haskell is certainly a reasonable choice for a markup transpiler.


I agree, Haskell is a great language, but its ecosystem has much less support than C. Luckily for us Atwood made jgm write CommonMark in C.


Looking beyond the language itself; a part of the problem was having to wrestle with Haskell's cabal. Now the question is whether that poison is worse than other language's package managers (or kind-of-package managers).


So you had to build a few deps, who hasn't had to do that?

I recall having to build a largish dependency tree to get a recent version of Gtk2 running on Solaris10, from Glib2, atk, cairo all the way up.

That's life.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: