More

ubolonton_ · 2025-04-25T15:19:05 1745594345

I introduced ClickHouse at my company 2 years ago, and came to the same conclusion.

For observability, it seems to have become the dominant storage choice for new observability startups.

And the newly introduced JSON type would help it winning even harder.

ubolonton_ · on Feb 22, 2024

What do you mean by “parse of that tree to get useful structures out”? Can you provide some concrete examples?

IshKebab · on Feb 22, 2024

Yeah suppose you write a simple config language like:

  let a = 12;
  let b = a + 5;
  ...

Tree-Sitter will give you a tree like

   Node(type="file", range=..., children=[
     Node(name="let_item", range=... children=[
       Node(name="identifier", range=...)
       Node(name="expression", range=..., children=[
         Node(name="integer_literal", range=...)
   ...

Whereas Nom/Chumsky will give you:

    struct File {
      let_items: Vec<LetItem>,
      ..
    };
    struct LetItem {
      name: String,
      expression: Expression,
    };
    ...

Essentially Tree-Sitter's output is untyped, and ad-hoc, whereas Nom/Chumksy's is fully validated and statically typed.

In some cases Tree-Sitter's output is totally fine (e.g. for syntax highlighting, or rough code intelligence). But if you're going to want to do stuff with the data like actually process/compile it, or provide 100% accurate code intelligence then I think Nom/Chumksy make more sense.

The downsides of Nom/Chunksy are: pretty advanced Rust with lots of generics (error messages can be quite something!), and keeping track of source code spans (where did the `LetItem` come from) can be a bit of a pain, whereas Tree-Sitter does that automatically.

ubolonton_ · on Feb 23, 2024

Ok, understood. I was confused by the phrase "parse of that tree".

Tree-sitter's output is closer to being "dynamic" than "untyped", though.

It's not too hard to build a layer on top of tree-sitter (out of the core lib) to generate statically typed APIs. I haven't felt the need for that yet, but it may be worth exploring.

> actually process/compile it

At work, I built a custom embedded DSL, using tree-sitter for parsing. It has worked well enough so far. The dynamically-typed nature of tree-sitter actually made it easier to port the DSL to multiple runtimes.

> provide 100% accurate code intelligence

Totally agree that tree-sitter cannot be used for this, if we are aiming for 100%.

danielvaughn · on Feb 22, 2024

Not the person you’re asking, but basically anything that needs to happen after the initial parsing stage. So you convert your raw text into an AST, but there’s usually some processing you need to do after that.

Maybe you need to optimize the data, maybe you need to do some error checking. Lots of code is syntactically valid but not semantically valid, and usually those semantic errors will persist into the AST (in my limited experience).

ubolonton_ · on April 4, 2023

In Emacs, local-function-key-map has this entry:

    C-x @ s                   event-apply-super-modifier

So to get s-g to work, you'd configure the terminal emulator to convert the Command+G key press to the C-x @ s g sequence.

    # Hex code (iTerm)
    0x18 0x40 0x73 0x67

    # kitty (without Emacs doing the integration)
    map cmd+g send_text all \x18@sg

Konsole somehow seems to have this done for the whole English alphabet, so it just works out of the box there.

For 2-modifier bindings, I guess you can add a creative entry to local-function-key-map. (If the other modifier is Shift, you can just use uppercase letters.)

ubolonton_ · on Feb 24, 2021

When tree-sitter reaches 1.0 [0], it may be possible to eliminate the tree-sitter-langs upstream, or both.

[0]: https://github.com/tree-sitter/tree-sitter/issues/930

ubolonton_ · on Feb 23, 2021

> the query DSL would be a bit more sophisticated, allowing you to specify the actual name resolution rules of your language.

This sounds very interesting. Will the query DSL (spec) be available to the public?

dcreager · on Feb 23, 2021

That's the current plan! In particular, because we want to allow language communities to implement support for their own languages, and not have to be blocked on my team finding the time to do it. (Just like they can do now with the parser and syntax highlighting / fuzzy code nav rules.) Linguist is our role model here — it currently includes language detection and (regex-based) syntax highlighting rules for 500+ languages. Most of those are contributed by the community. There's no way that my team can migrate all of those in any reasonable amount of time, especially while having to balance that with other feature development and operational responsibilities.

ubolonton_ · on Feb 9, 2019

This is probably between the baseband firmware and the SIM card, so rooting wouldn't help. And it's using A-GPS (probably in MSA mode, where the location is derived on the servers, not the phones), not just cell tower triangulation.

ubolonton_ · on Feb 9, 2019

It appears that you will be able to opt out of Apple's new Enhanced Emergency Data. However, that part is device-initiated anyway. The traditional Network-Initiated Location Requests probably cannot be opted out of.

ubolonton_ · on Dec 5, 2017

https://magit.vc/

ubolonton_ · on May 25, 2017

It's easier to see commits of a branch grouped together in most history viewers. Even though sorting commits topologically can help, most history viewers don't support that option.

When there is an undesired behavior that is hard to reason about, git-bisect can be used to determine the commit that first introduced it. With a normal merge, it will point to the merge commit, because it was the first time the 2 branches interacted. With a rebase, git bisect will point to one of the rebased commits, each of which already interacted with the branch coming before.

Resolving conflicts in a big merge commit vs in small rebased commits is like resolving conflicts in a distributed system by comparing only the final states, vs inspecting at the actual sequences of changes.

ubolonton_ · on March 1, 2017

Thanks! I skimmed through the doc. It looks very clean from a high-level view.