Hacker Newsnew | past | comments | ask | show | jobs | submit | angadsg's commentslogin

https://www.techinasia.com/ -- they have high quality tech journalism for Asia


This actually had some good focus on Indonesia and SEA in general, which was a pleasant surprise


IMO folks are better off deploying their own version where they can adjust a few knobs (e.g. split chunk size) to get better results, given that PDF Q&A is such a commodity application.

Wrote a <50 lines version with LangChain to run on your terminal with any folder full of PDF documents - https://github.com/angad/dharamshala/blob/main/docs.py

return_source_documents is particularly helpful to get a sense of what is being sent in the prompt.


Consider adding a bit of overlap to the text chunks. Say, 300 tokens:

  text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=300)
Otherwise, you'll likely end up with too many edge cases in which only part of a relevant context is retrieved :-)


This is actually pretty insightful - I have done something similar with splitting my obsidian data into chunks using paragraphs and headers as demarcation, but this solves a more interesting problem of nuance! I like it.


If you're interested in improved chunking, I mentioned a few strategies in my talk here (timestamp linked, <1min): https://youtu.be/elNrRU12xRc?t=536 that I used when building https://findsight.ai


If you're already splitting documents by paragraph, consider using (as much as possible of) the previous and next paragraphs as overlap.


We did chunks with a sliding window of previous page + current page + next page, with overlaps. That produced the best results.


This would be much more useful if it used vicuna or you could select a different model


The link to your repo is returning a 404 now, whereas I could see it just a min ago.



Stack Overflow newsletters[1] are great as well. It sends you top questions of the week, both answered and unanswered. Great way to learn small things about things you love. Its the perfect application of "Knowledge should be bite-sized".

I subscribe to RPi, Net Eng, CS, theoretical CS and Code Golf news letters. Any other suggestions?

http://stackexchange.com/newsletters

edit: Added link


It took me 15 minutes to make this. I use my own framework that collects tweets based on hashtags and posts to Tumblr and other social networks. Probably its my way of remembering the man who gave the world the device from which I am typing this.

I bought livelikesteve.com for $7.49 from Godaddy and the ads are there just to get me back that cost. Waiting for DNS propagation.


Cameron Winklevoss Status: Enemy Facebook stake: .022%


I actually want to use it to ask out a girl. Any suggestions?


I was amused by #protolol jokes on twitter. Wanted to collect them in one place. Wrote a simple GAE python application that would search for a particular hash tag and post the selected tweets to your tumblr blog. Fixing some usability issues. Will post the link soon :)



There was a similar "gaping" hole 2 years back. http://news.ycombinator.com/item?id=164422 Better email them.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: