No. 28157Anonymous 26th September 2022 Monday 9:15 pm28157This thread is now about scraping websites
I didn't want to bump off the last thread so I'm co-opting this one.
I've been looking at making a spreadsheet of most different types of wood. It looks like the guy here has already done that and has a table backing up the search results on his site, but I can't access the table itself - I'm assuming there must be a table because the filter has quite a lot of variables.
>>28157 I've done an enormous amount of this kind of work.
There are basically two ways to scrape a site - the simple way is to grab the HTML using the curl library (it works in just about every language) and then parse it; now that often doesn't work if its a modern site using Javascript rendering - so in those cases, you need to script/control a browser module. Most people do that with WebKit / Safari, but it isn't straightforward, at all.
Looking at the HTML on that site, it seems like its a WordPress in disguise. The HTML isn't tagged well, either - in short, it will be challenging to parse, whatever you use; it will take a lot of time to get right, depending on how big that site is, you might be better off spending a day manually cutting/pasting it up.
>>28158 Right, I've messaged the creator asking for the table. Thanks for the advice but if it's going to be messy even after learning something, I reckon I'll just do it manually provided he doesn't get back to me.
In the meantime you can expect my game involving trees to be released within the next 30 or so years.