2010-03-30

Help request: extending Finance::Quote

It’s not very common for me to ask explicit help with writing new software, but since this is something that I have no experience with, in a language I don’t know, and not mission-critical for any of my jobs, I don’t really feel like working on this myself.

Since right now I not only have a freelancing, registered job, but I also have to take care of most, if not all, house expenses, I’ve started keeping my money in check through Gnucash as I said before. This makes it much easier to see how much (actually, little) money I make and I can save away or spend on enjoying myself from time to time (to avoid burning out).

Now, there is one thing that bothers me: to save away the money that I owe the government as taxes (both VAT I have to pay, and extra taxes) I subscribed to a security fund, paying regularly (if I have the money available, of course!); unfortunately I need to explicitly go look up the data on my bank’s website to know exactly how much money I have stashed in there at any time.

Gnucash obviously have a way to solve this problem, by using Finance::Quote Perl module to fetch the data from a longish series of websites, mostly through scraping. Let’s not even start to talk about the chances that the websites changed their structure in the past months since the 1.17 release of the module (hint: at least one had, since I tried it out manually and it only gets a 404 error), but at last Yahoo, while accepting the ISIN of the fund, doe not give me any data for the current value of the share.

Now, the fund, which is managed by Pioneer Investments and they do provide the data, and via a very simple, ISIN-based, URL! Unfortunately, they provide that data only… in PDF. Now, this does not seem to be too bad: the data is available in text form because pdftotext provides it properly, and it’s clearly marked with the previous line to be a fixed string; on the other hand, I have no idea how it would be possible to scrape a PDF, especially in Perl, and even worse within Finance::Quote!

If somebody feels like helping me out, the URL for the PDF file with the data is the following, and the grep command will tell you what to look for in the PDF’s text. If you can help me out with this I’ll be very glad. Thanks!

# wget 'http://www.pioneerinvestments.it/it/webservice/pdfDispatcher.jhtml?doccode=ilpunto&from=02008FON∈=IT0000388204'
# pdftotext pioneer_monetario_euro_a.pdf* - | grep 'Valore quota' -A 2
Valore quota

13,158


Share this:

				Share on Threads (Opens in new window)
				Threads
			

				Share on Facebook (Opens in new window)
				Facebook
			

				Share on Mastodon (Opens in new window)
				Mastodon
			

				Share on LinkedIn (Opens in new window)
				LinkedIn
			

				Share on Reddit (Opens in new window)
				Reddit



			FinanceQuote
GnuCash
PDF
Perl
Scraping


			
			
				
									
				
					
					
						Flameeyes					

					2840 posts



					


			
			
				
					
						
							
								
								
									
										Comments 1
										
										
													
			
				
					
												Anonymous says:					


					
						2010-04-01 at 08:15					


									


				
					Scraping the number value out of the PDF is easy enough once you have the PDF.  The following perl snippet will do the right thing as a freestanding tool.  I have not looked at Finance::Quote to see how to integrate it, but since this snippet prints the value to stdout, it should be easy to convert to a subroutine.#!/usr/bin/perluse strict;use warnings;# Spawn pdftotext as a subprocess and connect its stdout to $pipeopen my $pipe, “/usr/bin/pdftotext pioneer_monetario_euro_a.pdf – |” or die “Failed to pipe: $!n”;# Track whether we have seen the required fixed string yetmy $state = 0;while(<$pipe>) {if (/Valore quota/) {$state = 1;next;}# After the required string has been seen, the first line with a# decimal digit is considered to be the sought value.if ($state == 1 && /[0-9]/) {print $_;last;}}If no one gives you a more complete answer within the next couple of days, I will grab Finance::Quote and try to solve the full problem.  In the meantime, I hope this gets you (or another reader) on the right track.
				


				Reply
			

		

										

										
																			
								
							
						
					

					
									
			
		
	
	
		
			
				
					
						
							
							
								
									
		
			Leave a ReplyCancel reply
			
				
									
									
					
					
							
		

		
		

		This site uses Akismet to reduce spam. Learn how your comment data is processed.
								
							
						
					
				

				
							
		
	



	
		
			Related Posts
			
				

	
		
		

			10.04.26
			Integrating Paperless-NGX with my own PDF Renamer
			
						
				

												
																	
										
									
									
								
									
										
											Flameeyes										
									

																	
							
											
			
					
	



	
		
		

			27.03.26
			So I Started Playing With Claude Code
			
						
				

												
																	
										
									
									
								
									
										
											Flameeyes										
									

																	
							
											
			
					
	



	
		
		

			13.03.26
			Not Building My Own Thermostat — I Went With TadoX, Here’s My Impressions
			
						
				

												
																	
										
									
									
								
									
										
											Flameeyes										
									

																	
							
											
			
					
	



	
		
		

			27.02.26
			Reflection On My School Days
			
						
				

												
																	
										
									
									
								
									
										
											Flameeyes

Popular tags

The Latest
View All

Integrating Paperless-NGX with my own PDF Renamer

So I Started Playing With Claude Code

Not Building My Own Thermostat — I Went With TadoX, Here’s My Impressions

Reflection On My School Days

Help request: extending Finance::Quote

Comments 1

Leave a ReplyCancel reply