[ rss / options / help ]
post ]
[ b / iq / g / zoo ] [ e / news / lab ] [ v / nom / pol / eco / emo / 101 / shed ]
[ art / A / beat / boo / com / fat / job / lit / map / mph / poof / £$€¥ / spo / uhu / uni / x / y ] [ * | sfw | o ]
logo
technology

Return ]

Posting mode: Reply
Reply ]
Subject   (reply to 27939)
Message
File  []
close
Capture.jpg
279392793927939
>> No. 27939 Anonymous
13th June 2021
Sunday 5:20 pm
27939 spacer
how to make notepad++ recognise multiline strings in embedded javascript?

(A good day to you Sir!)
Expand all images.
>> No. 28157 Anonymous
26th September 2022
Monday 9:15 pm
28157 This thread is now about scraping websites
I didn't want to bump off the last thread so I'm co-opting this one.

I've been looking at making a spreadsheet of most different types of wood. It looks like the guy here has already done that and has a table backing up the search results on his site, but I can't access the table itself - I'm assuming there must be a table because the filter has quite a lot of variables.

https://www.wood-database.com/wood-filter/

What's the best way to get a csv of the underlying data? Do I have to learn python or javascript?
>> No. 28158 Anonymous
26th September 2022
Monday 9:27 pm
28158 spacer
>>28157
I've done an enormous amount of this kind of work.

There are basically two ways to scrape a site - the simple way is to grab the HTML using the curl library (it works in just about every language) and then parse it; now that often doesn't work if its a modern site using Javascript rendering - so in those cases, you need to script/control a browser module. Most people do that with WebKit / Safari, but it isn't straightforward, at all.

Looking at the HTML on that site, it seems like its a WordPress in disguise. The HTML isn't tagged well, either - in short, it will be challenging to parse, whatever you use; it will take a lot of time to get right, depending on how big that site is, you might be better off spending a day manually cutting/pasting it up.
>> No. 28159 Anonymous
26th September 2022
Monday 9:47 pm
28159 spacer
>>28158
Right, I've messaged the creator asking for the table. Thanks for the advice but if it's going to be messy even after learning something, I reckon I'll just do it manually provided he doesn't get back to me.

In the meantime you can expect my game involving trees to be released within the next 30 or so years.

Return ]
whiteline

Delete Post []
Password