Show HN: A parser for electronic component descriptions

kasbah · on Aug 11, 2018

Hey folks, I created this to augment the Octopart API while working on bill of materials tools for https://kitspace.org.

Working with Nearley was a fun experience and I recommend the blog post that got me started with it to anyone who is interested in writing a parser in JS: https://medium.com/@gajus/parsing-absolutely-anything-in-jav...

We have started porting it to Antlr to make it language agnostic though we are still open to looking at other options as well. Does anyone have any good experience or some ideas for alternative approaches?

ReverseCold · on Aug 11, 2018

There are quite a few components so common that people refer to them only by their part number (ESP8266, 555, WS2812, etc) - are you planning to support things like that too?

Right now it looks like only common/standard parts are parseable (resistors, leds, etc).

kasbah · on Aug 11, 2018

I think once you are entering actual manufacturer part numbers you are better off deferring this to a parts database or search engine. In my uses I have been deferring this to an Octopart search. You can see this working on e.g. https://bom-builder.kitspace.org (enter something into the description field and click around on the wand icons).

We are planning to include standard part naming schemes such as Pro Electron [1] and JEDEC [2] to recognize common diodes and transistors where it makes sense. The more interesting approach to me is to actually try recognize transistor descriptions such as "npn sot23 125mW beta=200 ic>=200mA". There is some work in progress, by dvc94ch, for both approaches (if I recall correctly) on the antlr branch: https://github.com/monostable/electro-grammar/pull/4

[1]: https://en.wikipedia.org/wiki/Pro_Electron [2]: https://en.wikipedia.org/wiki/JEDEC

dantle · on Aug 11, 2018

You did a nice job with capacitors. It was cool to write X7R and have it show up as a characteristic.

gh02t · on Aug 11, 2018

That gets pretty fuzzy, though. For example, does ESP8266 refer to the actual chip itself (unlikely, at least in a hobbyist context) or one of the variety of modules? These general designations aren't really referring to specific parts, so much as classes of parts. That's probably not so useful here, given that this looks to be focused on parsing BOMs for ordering purposes.

elcritch · on Aug 11, 2018

Antlr the last I looked and tried to use it was a pain. Version mismatches and incompatible changes... Plain old bison/yacc/lex or some of the newer Rust parsing libraries would probably be simpler and easier to maintain.

kasbah · on Aug 11, 2018

Yeah dvc94ch who wrote most of what's been done for v2 so far was suggesting Rust and then web-assembly. Kind of warming to it after using Antlr for a while. We both really need JS versions though so I am still not sold. Compiling with Emscripten might be an option but it could be a lot of pain as well.

elcritch · on Aug 11, 2018

I’ve just tinkered with wasm and rust, but it was surprisingly pretty straightforward. It seems there a huge amount of effort being placed in the ecosystem. Great project so far! I’ll have to check out the bom tools.

kasbah · on Aug 11, 2018

But if I'd like to support older browsers I am out of luck, no?

striking · on Aug 12, 2018

Emscripten compiles to asm.js, which is just a subset of JS. CanIUse will tell you some browsers don't support it, but it really means they don't support accelerating it. Your code very well may still run in those browsers (but I'm not sure about the determinant of those cases).

elcritch · on Aug 11, 2018

Lalrpop looks useful! https://news.ycombinator.com/item?id=10296149

thosakwe · on Aug 11, 2018

Bison simpler than Antlr? What?

I very strongly disagree.

carapace · on Aug 12, 2018

"Parsing with Derivatives". http://matt.might.net/papers/might2011derivatives.pdf There's a lot of stuff about this online now

Or just use Prolog.

(I'm not joking. I've been messing with parsers and such in Prolog the past week or so and I'm just blown away. I'm happy but also I feel stupid for not learning it sooner. I've wasted so much time and effort. I've been working too hard. Just learn Prolog and then use it, the total time saved will pay for the learning curve by the end of the year.)

kasbah · on Aug 12, 2018

I am reading up on this a bit and it seems very interesting but I am not quite sure it would be able to fill the gap that something like Antlr fills. We want to generate parsers in various languages (JS, Python, Go) from a single grammar and existing implementations of "Parsing with Derivatives" seem to be experimental and mostly in functional languages.

Prolog is really interesting too but how would it work in practice for our use case?

carapace · on Aug 12, 2018

Ah, I misunderstood what you were asking about. My suggestions don't really apply so much to your case.

You might take a look at the Meta-II metacompiler. It's really simple but by the same token it's trivial to port it to different languages. It's part of the basis for VPRI's PEG (Parser Expression Grammar) parser generators.

https://en.wikipedia.org/wiki/META_II

http://www.bayfronttechnologies.com/mc_tutorial.html

matheusmoreira · on Aug 12, 2018

Nearley is amazing. The authors even implemented Joop Leo's optimization. If the other published optimizations were implemented as well, Nearley would probably become the Earley parser implementation. There is an issue suggesting this but it's marked as wontfix.

kasbah · on Aug 12, 2018

Maybe they are not interested in optimization for the sake of optimization? For my use case it has been plenty fast anyway (not that it's a particularly intensive use case).

gravypod · on Aug 11, 2018

Is there a formatter planned for formatting an object into a pretty printed string?

kasbah · on Aug 11, 2018

I have thought about it since it should be pretty easy to do. Feel free to open an issue or make a PR.