While I really like K/Q/J/APL (some of this is Stockholm Syndrome: I spent the first 3 years of my career writing K/Q at a trading firm, eventually becoming the human debugger for my team's K/Q "investment". It got so annoying that I worked on a little parser library to help me debug: https://github.com/kiyoto/ungod), they are doomed from a adoption/future viability point of view.
1. Steep learning curve without benefits: APLs like K/Q/J optimize for concision to the point of unreadability. Sure, it might be fun to decipher clever K/Q one-liners, but if you have to do it everyday for work in production, it's just exhausting. This lack of readability has limited adoption and the growth of its user community. For any technology is to have a viable community, it needs to be usable by a critical mass. K/Q in specific but APLs in general never had that, especially with more user-friendly alternatives like R/Matlab for scientific computing.
2. No open source community. There's Kona (https://github.com/kevinlawler/kona) by Kevin Lawler, but again, its community has been small because the underlying language is not design to be usable by enough people.
3. Domain specificity of array languages: These days, array languages are just too limited in its scope. Scientific computing, the bread and butter of any array programming language, has been made possible in other languages (ex: Python), making array languages not powerful nor compelling enough to use because most of its strengths can be found in other languages and its weaknesses are too crippling.
I heard that Arthur Whitney found his successors (a pair of Russian programmers from St. Petersburg, iirc), so K/Q will likely to be around beyond his retirement. That said, I think its user base will continue to shrink for above reasons.
J language is open source, this makes it viable without a company to support it.
Also, terseness may be a problem, but only when you're not used to it. Normal languages use a vertical structure, while APL languages use a horizontal structure. It is just a matter of looking for meaning in each line, instead of trying to combine lines to get the meaning of code, like we do in standard languages.
Ook is open-sourced too, and terseness is always a problem when you intend to work in a group of people or over long durations.
Maybe you're one of the happy few that spends more time writing code than reading it, but for an overwhelming majority of devs, the opposite is true. That's why even small readability improvements matter much more than a couple cool tricks a language may have.
Very long time ago I heard that C is completely unreadable[1]. I went and learnt it some time later and discovered that it was bullshit. Then I heard that C++ is unreadable[2] (I mean the source, not error messages...). I went and learned it and realized this, too, was bullshit. Another couple of years forward and I heard that it's impossible to read PHP code[3]. So I learned PHP (and started earning money thanks to it) and realised that "impossible is just a word". I then heard that indentation delimited blocks in Python make the code impossible to edit (and unreadable, btw)[4]. I learned it and realized that it's rather easy to configure your editor and make this non-issue. Around that time I heard that PERL is a write-only language[5]. I wanted to check, so I went and learned PERL. It took some time, but eventually I realized that, indeed, it's possible to read PERL. To make the rest of the story short: around that time, I decided I want to go polyglot and started researching various niche languages, like Lisp, Forth, Prolog, J, Factor, Avail, Haskell, Erlang and so on. I heard (mostly) that I'm insane and that it's impossible to either read or write any such language[1][2][3][4][5]. As you probably guessed already I went and learned most of them and realized that - shocking! - it is possible to both read and write all of them!
Now, I don't know either K or Q, but I have a very, very, very hard time believing that they are too terse and unreadable because of that.
From my experience, the "improvements in readability" are of course good, but they only shorten the time to master the language. Once you learn the language well enough, it becomes readable. And the very few syntactic issues which are objectively hard to distinguish/are confusing the eye, etc. tend to be worked around with formatting and font settings.
Written Japanese is totally impossible to read if you don't know it - and it takes reasonably longer to learn it. However, once you know it, it's readable. A Japanese person reading a newspaper in Japanese uses as much effort as me reading a newspaper in my native language.
[1] From Pascal and later Delphi users.
[2] From C users.
[3] From PERL users.
[4] From PHP users.
[5] From Python users.
Most of the time when I see a J programmer "read" some J code, it really means taking the spec and slowly, incrementally building up towards the same solution as the code they were handed. I have not seen them have much success at "reading" more than a couple lines of idiomatic code without being told what it does. I doubt it would scale up beyond a few lines of code (≈ a few "paragraphs" of code in Blub), having seeing a veteran mistake some quartic-time code for the Floyd-Warshall algorithm (which is cubic time).
I have not found this definition of "read" to be common in other language communities.
I've been writing q code for more than 6 years now. It's definitely possible to write code which doesn't look like line noise. It's easy to build an automated testing layer on top of your q codebase too (via the c or Java bridges). I feel it's a perfectly legitimate language for high performance computing
Edit: I recently started learning the J language. While it may not necessarily be a language you want to write your production code in, it teaches powerful concepts - it let's you write functions which can deal with radically different shapes of data with ease, via use of the operators to change the behavior of a verb to monad or dyad. It provides great insight into higher level descriptions of your domain
RE the learning curve: Do you think the APL family can exist without the terse syntax? Are there any aspects of the language that just wouldn't work well without it?
A comparison I can think of is that Lisp's whole code/data duality works well because the code is s-exprs, and that it would be pretty awkward to transform that if the syntax were more python-y
On the other hand, semicolons and braces have nothing to do with any Java features....
Yes, most of the examples look very limited compared to Lisp but there is Rebol / Red (http://www.red-lang.org/) which is atleast as powerful as Lisp in terms of homoiconicity (as well as in other things), if not more. Red/Rebol don't use sexps, don't look like this (((())))), are highly dynamic and homoiconic.
> RE the learning curve: Do you think the APL family can exist without the terse syntax?
So yes, I think that the APL family could exist without the terse syntax, but well hey, everything has its tradeoffs and all this is subjective - everyone has his own thinking - there might be someone in this world who in love with the APL syntax :D
I do love APL's syntax. It's part of the appeal for me.
That said, Red looks like a godsend. I was looking for something I can use without much setup and learning costs, that is small, elegant, efficient, fast, gets the job done, and is long-term maintainable. Red seems to fit the bill. I'm still evaluating it and it still looks good.
> Do you think the APL family can exist without the terse syntax? Are there any aspects of the language that just wouldn't work well without it?
Considering the evolution of APL, yes.
Ken Iverson was teaching math in Harvard, when he was growing the - mathematical - notation to describe the subject formally. It was happening in late 1950's, and the computers were entering our life. At some point it became reasonable to use that evolved notation as A Programming Language - or APL, for short.
However, the idea was to make understanding easier - all along! Same idea which Alan Kay is pursuing with his works, same idea with many good technologies, when they are young - Java, JavaScript... Importance of notation - be it in math, or in formal-and-executable-math is the ideology of early APL. Now they say "it's hard to beat expressiveness of a whiteboard", but still attempt to use, say, an integral sign (and, similarly, one-letter variables), as one would do when showing the idea on a whiteboard. The history of hand-written math preferred terse notation - and that is carried on in APL as executable math.
So... yes, terseness is important, and APLers will also say that choice of symbols is important too. It's only to beginners that they feel strange - pretty soon programmer learns them, gets used to them and refuses to substitute them with more readable longer variants - even though that would make certain things better. You don't name variables on a whiteboard with long_names or camelCase - at best you use sub- and superscripts. Granted, you can have cups and power towers and other things, and they were really impractical with 1960's level technologies... yet at least you have a singular text direction for expressions, in editors or in print.
J makes certain things more logical, and switches to ASCII. May be we'll invent even better notation (to me, this - http://matt.might.net/articles/discrete-math-and-code/ - looks like a good start to think about basic building blocks), but so far, we have APL family languages as arguably closest thing to math notation. To some, it's the closest way they can imagine between making a solution in the head and explaining that to computer.
In agreement with @avmich above, and to turn the question around from above:
> Do you think the APL family can exist without the terse syntax? Are there any aspects of the language that just wouldn't work well without it?
Do you think mathematics would exist or be shared and done as efficiently if it did away with the terseness of its symbols, Greek letters, etc...?
Yes, in J you can define average as:
+/%#
so that,
(+/%#) 2 3 4 5
returns 3.5
Or,
avg=: +/%#
avg 2 3 4 5
returns 3.5
Or,
sum =: +/
divided_by =: %
tally =: #
So, if you want to take baby steps into J you can:
avg=: sum divided_by tally
avg 2 3 4 5
returns 3.5
But, that defeats the purpose of being able to concisely manipulate abstractions to get work done for the sake of readability, and ONLY for those people who refuse to learn the symbols as in mathematics. Have you ever heard from an adult who has had basic mathematics say, 'I have to keep looking up that greek letter π'? It only takes me a very little while to review my concise code. I'd rather learn this sort of pattern recognition of short code when dealing with high abstractions to the spaghetti-like appearance, to me, of Java or C (I like C!). Strangely enough I find Lisp more clear to C-like langs. It may be more than just subjective preference, but order of reasoning in the syntax.
Most adults give up on maths around the level where the amount of symbols start to grow, or slightly before (to be clear, I'm not claiming notation is the only cause of this - it certainly isn't, but it does mean we really don't know whether most people would manage to pick it up; personally I think it is a major factor in making people struggle with maths).
For my part, the notation was certainly part of it. It matters. I used to love maths but I started finding it impossible to read even as the concepts were still easy enough to understand. I'd write things out as programs instead to understand it without being hampered by the notation. I learned symbolic differentiation that way, for example.
But I quickly realised that this basically closed maths off to me as a viable subject, and I opted out of all optional maths courses other than boolean logic for my CS studies.
I get that for those who find mathematical notation easy to work with, it seems indispensable, but don't underestimate the amount of people for whom the notation is the barrier that basically makes mathematics inaccessible.
I've picked up quite a bit of maths since, but always by understanding the concepts through code rather than trying to parse mathematical notation.
I've had - and still have, to a degree - similar troubles with understanding math notation. Some 9 years ago a friend, solidly mathematician, gave a good advice to me - at least try to pronounce the math expression aloud. That forces you to pay attention to each symbol - instead of trying to immediately find word-like "phrases", failing that and missing the expression as a whole.
APLs are - as Arthur Whitney likes to employ - supposed to be read symbol-by-symbol, since each one represent a whole operation. That's why APL program are so dense - you may have plenty of work done in a short line (but many APL writers prefer to keep lines short, to help with understanding). A screenful of APL can be moderately sized program, without need for scrolling... When you write J, you might be tempted to add more and more operations to the left (APL and J work from right to left), until it's hard to "explain" what the whole expression does - because it does so much. With an alternative - short, understandable expressions - you have another problem, that of naming :) - trying to give a traditional, long-worded name to an intermediate result of expression doesn't fit well with the rest of the style... So a good sense of balance is very valuable.
Another saying which may help to understand APL's mindset is that APLers spend 5 minutes to write a program, then spend next 1 hour to write the same program better and cleaner. That's what I understand as refactoring; the idea is to make the code more readable, the expression more obvious, more obviously correct, more reuseable... Properly done, the expression is a pleasure to look at and well worth the efforts to decipher it symbol-by-symbol. And another great option is documentation - lots of comments, just as in good math texts, where the idea is explained in plain English and then succinctly put to code.
This way I better see the math background behind the problem. I certainly have an easier time to change something here and there - the changes required are so small. I also see similarities between different pieces of code, if they are put as similar expressions close to each other, on the same screen. Yes, the notation is terse... but it at least has good sides.
Musicians learn to read music. Yes, it is hard to learn it first, before playing or listening, but that is how it has been incorrectly taught. Same with mathematics. Learning mathematics by coding is perfectly fine, but when you start to go higher to the more abstract level, you need abstract symbols. That can, and should come later, I believe. But, I would rather deal with learning C=2πr as standing for the circumference of a circle, than writing it out longhand. Not to mention, diagrams are good too, to understand what a 'radius' or 'pi' is. The longhand version would be at least a paragraph. You just wouldn't write many pages of longhand to avoid symbols. Fear of mathematics is a result of poor teaching, not symbols.
You are exactly right. My favourite is when resistance to APL-family languages comes from people who claim to be polyglots, and you ask them which languages and they list: C, C#, java, javascript, VB, ruby, python etc. Those are all really the same language with slightly different wording and details. Like german and english. APL-family is like Mandarin or Japanese. Lisp is like Latin (in fact lisp is really a mechanism to author your own language with s-exprs). Obviously, these languages won't be comprehensible to you until you put the effort into learning the abstractions... this isn't like transitioning from C++ to Java where you can broadly carry the same concepts over.
There is nothing difficult about reading these languages to people who use them, anymore than it is difficult for a musician to read sheet music. In fact, APL languages have less ambiguous parsing and precedence rules which I find make them easier to read.
> Those are all really the same language with slightly different wording and details. Like german and english.
I'd argue you only think that because they've opted for a familiar syntax. E.g. the object model (or lack of one) is vastly different between these languages and they employ drastically different type systems.
Ruby is closer to Smalltalk than C in most respects other than syntax, for example, and provides most of the abilities lisp gives you, including the ability to define domain-specific languages. The major aspect you're not getting from Ruby would be homoiconicity.
Here's and article comparing Lisp and Ruby[1].
I'm not saying learning languages outside of this group isn't important, but that a lot of the reason why languages like Lisp and APL are seen as so different has more to do with syntax than semantics. Most people don't know what the semantic differences even are, because getting past the alien syntax is too much effort. When you do, the differences aren't all that huge.
> It's only to beginners that they feel strange - pretty soon programmer learns them, gets used to them and refuses to substitute them with more readable longer variants
That reminds me of how McCarthy always intended to replace S-expressions with M-expressions, but it never happened because programmers found they liked the S-expressions after all.
I believe Shen [1] creates m-expressions when certain expressions are passed through its compiler. Shen should appeal to Haskellers and Lispers I would believe. Deech has created a Shen port for elisp [2], and it exists on many other language platforms, because it runs on an enriched lambda calculus called Klambda implemented in a small set of functions.
Do you think the APL family can exist without the terse syntax?
I have yet to meet a programming model that can only exist in one specific syntax. The closest I've seen is lisp-style metaprogramming, which requires syntax that makes its nesting structure obvious (you're operating on pieces of syntax, so of course you need to know something about it); of course, you could use indentation instead of parens to show that structure if you like. APL-like operator lifting is entirely a semantic issue. It can exist just fine in s-expressions, ML-like syntax, Python-like syntax, etc.
APL-family languages typically even tie their own hands in terms of semantic flexibility because of their syntax. They have separate syntactic classes for (first-order) data, first-order functions, and second-order functions. The parser doesn't know what to do with an identifier unless it knows that identifier's definition, which makes compiling APL pretty awkward (ever seen a parser that tries to do data flow analysis?). It also limits their flexibility as functional programming languages. J, at least, by keeping a symbolic representation of all functions at run time, enables a sort of workaround where you can package up a vector of first-order functions, but this is much more awkward to work with than actual first-class functions.
But what really disappoints me about APL-style syntax is limiting its flexibility as an array-processing language. The language semantics offer this great ability to automatically "lift" any operation to work on arbitrarily high-dimensional arrays. Want the square root of every number in a 3-tensor? Go for it! Want to average every column in a matrix? Just average it! In more recent incarnations, you can even mix and match the shapes you're lifting up to, like adding a vector to every column in a matrix.
But that flexibility stops short if your function needs three inputs. Infix-only syntax has no way to write function application with more than two arguments, so you're forced to pack some of those inputs together into a single argument. Now you can't choose to lift the operation to every argument separately: you're forced to put the same array structure around multiple arguments. If you try to get around this by writing a second-order function (rather like currying), you now have one argument you can't lift to because only first-order functions get the dimension lifting.
So not only is the programming model not inextricably tied to this particular syntax, I'm not even convinced it's the best syntax for the programming model.
The semantic core of APL is operations on arrays. These same operations could be expressed in any other language. SaC[1] was a research project that expressly brought APL functions into a C-like language. Yorick[2] is an interpreted, C-like, array-based language with a production quality implementation.
'3 years of my career writing K/Q at a trading firm, eventually becoming the human debugger for my team's K/Q "investment"' - that sounds so familiar to me lol, except it's my second year of using kdb in a trading firm, but still cannot run away from becoming a human debugger, and all I can use as a reference is code.kx.com.
Is code.kx.com your only resource because there's not much else written publicly in these languages, or is it because your employer blocks you from using other resources which are publicly available?
I got exposed to kdb+ while interviewing in quant and would kill to use it in a non-financial metrics setting, but the language and ecosystem just can't let me get there. kdb+ is so fast and powerful you'd think it impossible.
They're not directly comparable, but kdb+ would crush InfluxDB head to head in an apple/orange comparison if used for the same thing. I actually can't think of any time series-like store that remotely comes close to the feats I have seen kdb+ with Q accomplish. Too bad the productivity and knowledge overhead is so high.
Seriously, I'm working in realtime metrics right now and I spent a week playing with kdb+ as the central store. I can tell there's a gold mine there, but I'm just not swinging a big enough pickaxe. If there was a dumb Python/Go/etc typical ops hacker frontend to that whole system I would be throwing every dollar I have to my name at it; my kingdom for burying that system under a few layers of abstraction and paying some quant Q hacker a Ferrari or two a year to hide the sausagemaking from everyone else.
(Related aside: Like building furniture from stacks of cash? Learn K/Q/kdb+ and head for NYC.)
>kdb+ is so fast and powerful you'd think it impossible.
I suspect there is a religious/mythological aspect to K/Q's reputation for speed. During the year that I was using Q I often found it to be slower than the equivalent Matlab code (which benefits from a JIT) and or even NumPy (which has many naively implemented operations).
IMHO, I feel Matlab is too clunky and inelegant for data transformation compared to APL-inspired languages, which needs a slightly mathematically inclined user to appreciate and unleash it's true power, with 10 times less code.
The biggest advantage of a language like Q is that it's inherently parallel. Map-reduce is naturally built into the language, so this allows the users to think in terms of breaking problems down into sub-problems and parallelizing them via the "parallel apply" construct[1]
Also another very powerful feature is the braindead simplicity of Inter process communication between KDB+ instances [2]
I hear this often about both array and functional languages but I think this kind of slogan leads to a misunderstanding of the MapReduce framework (and what's special about it).
In a functional or array language, map and reduce have the following signatures:
map : key list -> (key -> value) -> value list
reduce : value list -> (value, value -> value) -> value
These are strictly less powerful than the functions used in MapReduce:
Combining this API with a (1) a distributed file system which lets you co-locate computation with data across many computers and (2) a parallel partitioning algorithm enables petabyte-scale computation. It's qualitatively different than in-memory array processing.
Does the Q/K implementation actually execute in parallel? I've always had a hard time figuring out exactly why K is supposed to be so fast (except for column stores, interpreter fitting in L1 cache, and fast hand-written primitives).
Yes, you can start the server with any number of slave processors. As long as you're not writing data to global namespace (i.e, if you can keep your code purely functional), then as pointed out above, one can use the "peach" (parallel each) operator [1] to parallelize execution. Further you also get the "distributed each" to distribute load amongst multiple processes instead of threads [2]
>machine: 16 core 256GB (in all cases: date partition, sym index. all queries in RAM.)
...
>all query data is cached in RAM (no disk access).
1) What happens when you run these benchmarks on a machine with only 16GB of RAM?
2) How does the KDB performance compare to doing the equivalent operations on a Pandas DataFrame (which, since these are simple in-memory operations, seems like the only fair comparison).
> 1) What happens when you run these benchmarks on a machine with only 16GB of RAM?
kdb+ uses ~ 1.2 MB of Resident RAM at startup. In addition, kdb+ storage model in memory and on disk have very little overhead (a few hundred bytes) per column over the raw binary data.
The above, combined with memory mapping, allow for large databases to be queried with great performance.
I query billion row futures databases on a Macbook Pro with 16 GB RAM with kdb+.
> 2) How does the KDB performance compare to doing the equivalent operations on a Pandas DataFrame (which, since these are simple in-memory operations, seems like the only fair comparison).
Could someone provide the equivalent Pandas code running against similar TAQ data for the queries in the following benchmark?
> They're not directly comparable, but kdb+ would crush InfluxDB head to head in an apple/orange comparison if used for the same thing. I actually can't think of any time series-like store that remotely comes close to the feats I have seen kdb+ with Q accomplish. Too bad the productivity and knowledge overhead is so high.
Do you have any more specific use cases for the kind of analysis that kdb+ is used for? I'm working with timeseries sources that represent streams from internal variables from embedded systems and would like to evaluate whether kdb+ brings something to the table that we could use. (What we are doing probably falls into the category Complex Event Processing, i.e. windowed operations on the data stream with some predicates being evaluated globally.)
kdb+/q is great for this sort of thing (data stream processing and analytics). The "traditional" kdb+ example/benchmark, is some query on big data, and it excels at that. But another use case is to setup several real-time kdb+ processes, like a processing pipe-line, or distributed work-load, that react to real-time messages/events, and produces some output (derived analytic, alert, action, etc...). One example is trading, where you have real-time market data being sent to some kdb+ processes, which run some analytic/rules, and issue/manage orders sent to the market. It takes surprisingly few lines of q code to set something like this using kdb+.
I really like J and what array-based languages I've played with in the past. Can you quantify building furniture from stacks of cash? Particularly in comparison to the stories recently that have been circulating [0] comparing AmaGooSoftBook salaries?
That sounds like the low-end base salary for ~entry level. It probably gets topped up with ~100k bonus. There are a couple places that hire many but don't pay well.
At what I think is the high end, I see about an offer per quarter of $500-900k base with $1-2M bonus (usually guaranteed for the first year) for pure technology role. Add more front office work and the bonus potential shoots up (but it's a tough gig at the moment at least).
For context, non-kdb principal level roles I've seen on west coast top out at around $1M, mostly in RSUs, from a say a $250k base at the usual names.
I've seen offers in the $1-10M range on the east coast to build competitive technology. Amusingly, on the west coast, I've only seen the stereotypical "be my technical cofounder and get screwed on comp" offers to do so.
Wow. I've been playing with APLs on and off for a while now, so I'd be very interested to hear more (e.g. where one finds such offers). Any chance you'd be up to chat? My email's in my profile.
Drop me a note (email in profile). I'm not a recruiter, these are the offers I've been contacted with, through my personal network based on past gigs.
It's not just knowing an APL though -- far from it. For the chi/nyc it's a mix of hardcore low-level stuff, and some market knowledge. For the west coast, a mix of low level stuff (less hard core) and more distributed systems stuff.
I shouldn't say too much. But more than one large firm has explored writing an in-house replacement and floated actual $ offers.
Occasionally someone pops up wanting to do a company to compete with them but they're typically offering paper of dubious value, covered in slime.
In any event, I'm not particularly interested in going to war to steal their market share. I do worry that they're going to go into a long decline to irrelevancy though -- I'd very much prefer that not to happen. I'd rather they/FD write a new generation building on the best of arthur's work plus some things more suited for new platforms and workloads. There are big opportunities there.
Agreed. The variety of workloads (a la Stonebraker's One Size Doesn't Fit All paper) is where many big opportunities lie. I'm interested to see what kinds of workloads and use cases are of interest in the marketplace.
It is very sad that they won't open up, and very short-sighted as well. I'm sure Whitney and KX are perfectly happy with their millions, but they are sitting on one of the most impressive software achievements around and refusing to share it. It's not even like it would cost them. First Derivatives could grow a consulting empire. I mean, look at open-source MongoDB. It is a piece of shit, and it is valued at $1.4B. KDB is finest thing I have ever used, and First Derivatives purchased a controlling stake in KX for £36.0M. It makes no sense.
Maybe kOS will be open-source. Maybe it will be the big one, Arthur's chance to change the course of software development history.
It is really sad indeed. But I suggest you take a look at the J language. It's free for commercial use, unless you want to integrate the JDB, the equivalent of KDB+ column store. It also provides a rich feature set:
After going through the J primer, KDB+/Q feels like a distillation of J - it basically took the most accessible ideas of J and packaged them for mainstream production use - some of their features such as these make it easy to setup a computing harness for electronic trading in no time:
I am aware of J, but it seems to me that J is comparable to K, not Q/KDB+. If you want a python comparison, then K is python, and Q/KDB+ is more like pandas. Except much, much quicker, and more powerful, but without as many tools and packages to support it. But that could be fixed - those tools and packages don't exist because the community isn't there, because it is (sadly) closed source.
J versus K is like C versus C++, similar syntax but different programming style. J is array-based and K is list-based. J is mostly just APL converted to ASCII representation. Q is mostly just some light syntactic sugar over K.
"they are sitting on one of the most impressive software achievements around and refusing to share it"
they tried. About a year ago they released 32 bit version for free with shockingly liberal license - anything goes, including commercial use, unlimited. About couple of months ago they pulled it back and replaced with free non-commercial use only. well, they are the owners, it is their call, but still k will remain as "one of the most impressive software achievements around", orthogonal to most of the [fucked up] IT trends of the past 15 years. It is a shame if it fades into a quantitative elitist oblivion.
This is the classic "innovators dilemma". My 2-cents: Kx has a good business selling kdb+ licenses to the big banks and trading firms. They've been exploring how to expand beyond this niche, but many big data/analytics startups go straight for the FOSS solutions. In addition to the cost/licensing advantages of FOSS, a large user-base and ecosystem has developed around things like mongoDB, giving them compelling business advantages over kdb+ (even with kdb+ performance advantages).
Although mongoDB is a demonstration of how to build a successful business around open-source, it's very different to start out that way rather than move that way after you're already established as closed/licensed product. Kx would have to essentially kill-off their existing profit stream with the goal that they could build a larger business around FOSS kdb+. That's a risky proposition, fraught with many pitfalls.
It is a shame they backed-off of the fully-open 32-bit version. At least that had some potential to spur some user-base/ecosystem growth (at a minimum, this could have encouraged the development of novel clients/editors/REPLs/debuggers/charting/etc...), without threatening their core profit stream.
It's not obvious to me that an open core business can sustain the necessary margins to be interesting as an engineering company rather than a glorified services business. There are been very few examples (friend of mine argues it's just RedHat).
You are correct though that their model over the years has been to extract a large amount, up front, from a small number of users. They (mostly FD) have failed to make the leap to users outside finance due to what I'd call cultural reasons. They also lack strong technical leadership imnsho (they're, at heart, not an engineering company).
I'd certainly take a swing at doing an open source version for them but not clear to me that they'd know how to play it.
If you got a free 32-bit version while while the terms were "shockingly liberal" then maybe those are the terms that govern the use of that binary? I don't know for sure.
I do know in addition to changing the license they have made changes to the software since that time. The size of the binary has increased.
I worry more about Kx being acquired by some large company, maybe a competing database vendor, that cares little about software quality.
KDB with a 2GB addressable RAM limit is almost pointless. It was good that they were at least trying to encourage new developers, but you could never build anything serious with that limitation.
Ask the commenter one down from me, 'wsfull, he knows what I'm talking about!
q/kdb+ is _enormously_ fast (or it was when I used it), but that was a result of 1) having a specific subset of problems to which the binary was highly tuned, 2) by Arthur Whitney who, in my mind, is one of those 100x engineers who was the key reason behind why it was so good as evidenced by 3) he's no longer with them and as such you'll notice an appreciable decline in code quality. I'm sure if AW was still working on the code-base, you'd be seeing a lot more AVX512 SIMD usage and clever things like that. (Take IDA to your q binary, it's...acceptable but nothing magical like AW's work. In fact, the lack of attention to detail is so significant the new engineers allowed even a poor rev. eng.[1] like myself to use a very very standard method to get a symbol table fully populated, confirming my suspicion that there's very little platform specific code).
RE: 36MM - that makes sense. q/kdb is the definition of "low volume, high margins". You only have so many IB's in Midtown, wealth management funds in Stamford, and a PIMCO here and there in Newport to shop your product to. (As opposed to say, Oracle, where there's broad appeal, residual government contracts as a result of legacy PL/SQL code from 20 year old code that's kept in production.)
[1] E-mails in the profile if you want to hear how I got a symbol table in, but I'm sure you already guessed by now.
Why use such a thing? Array languages are very powerful for dealing with high-dimensional data. R, matlab, octave, and Julia can all do this and are satisfactory for most people. However, proponents claim that languages such as K are much more expressive.
The APL family gets a bad rap for being "read-only", but I think it's mostly because they've fallen out of favor and have become unpopular. People who have experience working with these languages can encode/decode incredible amounts of information to/from a few lines once they see "phrases" and "sentences" instead of "my cat chewed the 56k modem line". We don't write Integral() in calculus when we want to take an integral, so why not have similar shorthands in computation?
Here's the vocabulary of the sister language J for comparison:
To be fair, those things could all be deliberately unspecified. Granted, the shortness of a language's spec is less impressive when it is simply the result of the language being woefully underspecified.
Does anyone have a good resource on this family of languages or otherwise on any of them individually? Every time I've run across references to K/Q I've poked around a bit and it feels like they kind of exist as linguistic curiosities in some vacuum with a vaguely unexplained creation myth about a man named Whitney.
How would one actually go about getting a username and password, I see these languages often, and know this is the true successor since Whitney is working on it, but i think all of these are neat. Or should I just learn Kona/kerf/xxl?
If you just want to play around (no commercial use) you can download kdb+ from here: https://kx.com/software-download.php but you only get the 32-bit version for free, so keep your DB under 4 GB. With kdb+ you get K, the language, Q, which is a superset of K, and a column (time-series) oriented database engine. I recommend against using it, because if you want to do something meaningful (like make money somehow) then you need to buy a commercial license. I hear that they cost around $2k per machine (or possibly per core).
Kona is nice but it is a full version behind K (Kona is a copy of K3, and kdb+ is on K4) and it isn't quite as fast/optimized. It's great if you are set on using K but don't have the cash and don't need database integration.
Kerf (by the same guy who made Kona) has a friendlier syntax but seems to be commercially oriented similar to K (as in email us for free demo software but pay up when you want to do it for real).
J is open source, free for commercial use, has some decent tutorials written for it, and even has a free database engine (JDB requires a license for commercial use, but Jd seems to be free for everything. It probably isn't as fast as K, but it has a nice feel to it. The runtime is under 3 MB and even with a Qt IDE and a bunch of extras it's under 30 MB. You can make Qt base GUIs with J and do quite a few things.
I started learning J just last week and have been doing some problems on Project Euler to get familiar with it. I like it quite a bit. It even has built in functions for prime factorization, finding the prime number sieve, and and extended precision mode which makes solving some problems very easy.
K/Q is famous for extremely short code, but that is partly due to a misunderstanding of K/Q's nature. Sure, some things are expressed more concisely because you are operating on arrays rather than doing loops, but most of the conciseness comes from two other factors. First, there are syntacitc tricks to use fewer characters, e.g. single-character names and $[c;t;f] instead of IF-THEN-ELSE. But the big thing that most people ignore is that K/Q is a DSL. It is designed connect to a server, acquire tick data, and do time series analysis on clean numeric arrays. Stick to those operations and your code is amazingly short. If you expand the notation to something more readable and try doing more general programming tasks, the verbosity becomes about like Lua.
Has anyone used kdb+ in a real-time, mission critical setup - where kdb+ is in the critical path? I am referring mainly to trading applications where kdb+ is in the critical path. Can it be used as a low-latency / high-throughput system, reliably?
A good question. Older versions of K had if and while as syntactic sugar. You could always do without them by using cond and adverbs. K5 did away with these syntactic forms, and for whatever reason Arthur decided to put them back in k6.
Has anyone looked at SciDB (http://www.paradigm4.com/) as a replacement for KDB. Seems like it would fit the bill and has a community edition. I do see some usage in the financial markets.
The symbolism implicit in the characters used in APL is lost with J/K/Q and any other ASCII variant. I highly recommend reading Ken Iverson's Turing Award Lecture too.
For anyone wondering what K is, I'll repost this old email exchange between an ML person and a K person (it's already online, so no privacy violation):
On Wed, 12 Oct 2005, Fermin Reig wrote:
> Dear Rico,
>
> I have read Mark Joshi's book on C++ design patterns (as well as your
> review) and I'd like to share my opinion of the book with you. (By the
> way, I find the information in your website very useful.)
>
> My interest in the book was to try to learn how theory is put into
> practise. The best way to do that is by actually implementing things, so
> I did: I reimplemented his C++ code in OCaml, a functional, OO language
> that I like using. However, instead of a direct translation of C++
> classes to OCaml classes, I chose a non-OO design. The result is quite
> interesting from the point of view of code complexity and programming
> productivity (and hence cost). Here's a summary:
>
> Monte Carlo (datatypes only)
> C++: 264 lines of code (LOC), (7 classes, 20 methods), 12 files
> ML : 33 LOC, (3 datatypes, 1 function), 2 files
> LOC ML/C++: 33/264 = 13%
> Binomial trees (datatypes only)
> C++: 264 + 138 = 402 LOC, (10 classes, 30 methods), 18 files
> ML : 33 + 12 = 45 LOC, (3 datatypes, 3 functions), 2 files
> LOC ML/C++: 45/402 = 11%
> Binomial trees (datatypes + main algorithm)
> LOC ML/C++: 165/522 = 32%
>
> I'm not claiming that Mark's C++ code is bad (actually, it's quite
> good). However, using a better tool results in obtaining a better final
> product. (Performance of Ocaml's code is competitive with C++ as well.)
>
> I have written slides with more details about this, which I have shown
> to a few friends. If you would like to see them, I can email them to
> you. (I have also told Mark Joshi about my evaluation.)
>
> Regards,
> Fermin
Hi Fermin,
Not bad. But my K implementation only uses 2 lines of code though...
EuroOpt:{[S;K;r;v;T;n;po] / spot, strike, rate, vol, expir, steps, payoff
/ initialize constants
p:(%2*m)*(1%D)-d:a-m:_sqrt -1+a*a:.5*(_exp dt*r+v*v)+D:_exp -r*dt:T%n;
/ apply n binomial steps
:*n{-1_ D*(p*1!x)+x*1-p}/K po' S*(1%d)^(2*!n+1)-n
}
...and here are some test cases...
po:{0|x-y}; / arbitrary payoff function: max(0, K-S) for put
/ compare results for n = 16, 32, 64, 128, and 256
\t EuroOpt[5;10;.06;.3;.5;;po]' _16*2^!5
/ pass in payoff functions for put, call, and digital \t
EuroOpt[8;10;.06;.3;.5;500;]' {0|x-y},{0|y-x},{y>x}
/ investigating convergence for a digital option with 1-300 steps...
EuroOpt[8;10;.06;.3;.5;;{y>x}]' 1+!300
So yes, I agree with you. Functional languages are cool and C++ is
verbose & restrictive. Unfortunately, in the real world people don't care
about any of this... :-)
Cheers,
- Rico
And that was the previous version of K, before Q/kdb+. Things are probably even more concise now, but I've stopped following this about 10 years ago. Good times.
Not sure why a downvote was left without a comment but I'll upvote you and leave a comment that I wholeheartedly agree. This is not an acceptable method of packaging or distributing software.
1. Steep learning curve without benefits: APLs like K/Q/J optimize for concision to the point of unreadability. Sure, it might be fun to decipher clever K/Q one-liners, but if you have to do it everyday for work in production, it's just exhausting. This lack of readability has limited adoption and the growth of its user community. For any technology is to have a viable community, it needs to be usable by a critical mass. K/Q in specific but APLs in general never had that, especially with more user-friendly alternatives like R/Matlab for scientific computing.
2. No open source community. There's Kona (https://github.com/kevinlawler/kona) by Kevin Lawler, but again, its community has been small because the underlying language is not design to be usable by enough people.
3. Domain specificity of array languages: These days, array languages are just too limited in its scope. Scientific computing, the bread and butter of any array programming language, has been made possible in other languages (ex: Python), making array languages not powerful nor compelling enough to use because most of its strengths can be found in other languages and its weaknesses are too crippling.
I heard that Arthur Whitney found his successors (a pair of Russian programmers from St. Petersburg, iirc), so K/Q will likely to be around beyond his retirement. That said, I think its user base will continue to shrink for above reasons.