More

angadsg · on May 29, 2023

https://www.techinasia.com/ -- they have high quality tech journalism for Asia

greggsy · on May 29, 2023

This actually had some good focus on Indonesia and SEA in general, which was a pleasant surprise

angadsg · on April 19, 2023

IMO folks are better off deploying their own version where they can adjust a few knobs (e.g. split chunk size) to get better results, given that PDF Q&A is such a commodity application.

Wrote a <50 lines version with LangChain to run on your terminal with any folder full of PDF documents - https://github.com/angad/dharamshala/blob/main/docs.py

return_source_documents is particularly helpful to get a sense of what is being sent in the prompt.

cs702 · on April 19, 2023

Consider adding a bit of overlap to the text chunks. Say, 300 tokens:

  text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=300)

Otherwise, you'll likely end up with too many edge cases in which only part of a relevant context is retrieved :-)

jcutrell · on April 19, 2023

This is actually pretty insightful - I have done something similar with splitting my obsidian data into chunks using paragraphs and headers as demarcation, but this solves a more interesting problem of nuance! I like it.

summarity · on April 19, 2023

If you're interested in improved chunking, I mentioned a few strategies in my talk here (timestamp linked, <1min): https://youtu.be/elNrRU12xRc?t=536 that I used when building https://findsight.ai

cs702 · on April 19, 2023

If you're already splitting documents by paragraph, consider using (as much as possible of) the previous and next paragraphs as overlap.

sergiotapia · on April 19, 2023

We did chunks with a sliding window of previous page + current page + next page, with overlaps. That produced the best results.

chaxor · on April 19, 2023

This would be much more useful if it used vicuna or you could select a different model

dabedee · on April 19, 2023

The link to your repo is returning a 404 now, whereas I could see it just a min ago.

angadsg · on March 20, 2018

Created a pi-hole friendly blocklist https://gist.githubusercontent.com/angad/3db2da1cb50a4432c9e...

angadsg · on March 12, 2017

Stack Overflow newsletters[1] are great as well. It sends you top questions of the week, both answered and unanswered. Great way to learn small things about things you love. Its the perfect application of "Knowledge should be bite-sized".

I subscribe to RPi, Net Eng, CS, theoretical CS and Code Golf news letters. Any other suggestions?

http://stackexchange.com/newsletters

edit: Added link

angadsg · on Oct 9, 2011

It took me 15 minutes to make this. I use my own framework that collects tweets based on hashtags and posts to Tumblr and other social networks. Probably its my way of remembering the man who gave the world the device from which I am typing this.

I bought livelikesteve.com for $7.49 from Godaddy and the ads are there just to get me back that cost. Waiting for DNS propagation.

angadsg · on July 10, 2011

Cameron Winklevoss Status: Enemy Facebook stake: .022%

angadsg · on June 10, 2011

I actually want to use it to ask out a girl. Any suggestions?

angadsg · on June 10, 2011

I was amused by #protolol jokes on twitter. Wanted to collect them in one place. Wrote a simple GAE python application that would search for a particular hash tag and post the selected tweets to your tumblr blog. Fixing some usability issues. Will post the link soon :)

angadsg · on April 23, 2011

page not found http://i.imgur.com/cj4vz.png

angadsg · on March 19, 2011

There was a similar "gaping" hole 2 years back. http://news.ycombinator.com/item?id=164422 Better email them.