nomadagile.blogg.se - Import pdf info into excel

#Import pdf info into excel how to
#Import pdf info into excel code

Name = txtList if nameIdx != -1 else ''īillNum = txtList if billNumIdx != -1 else '' If line = "EMAIL: = idx +2 # in your example it should be +2.

NameIdx, billNumIdx, priceIdx, expirDateIdx, paymentIdx = -1, -1, -1, -1, -1 TxtList = convert_pdf_to_txt(factura).splitlines() Lines = list(filter(bool,string.split('\n'))) # -*- coding: cp1252 -*-įrom pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter

#Import pdf info into excel code

Thanks for your help I take code from two examples you give me and now I can extract all info I want. IndexValue=indexName+Xline # add X to this index IndexName=l.index(name) #find the index of the name in the list Once we have a proper list, we can define a simple function that return a value, given the name and X (X lines between name and value): def get_value(l,name,Xline): I use a list comprehension and strip() to remove it: stringList= Here I have some extra whitespace at the end ( 'name1 ' instead of 'name1'). \n represent the line breaks (when you print the string, it prints on multiple lines)įirst, we convert the string to a list of lines, by splitting where there are line breaks: > stringList=stringPDF.split("\n")ĭepending on your string, you may need to clean it. You have a string stringPDF like this: name1 \nĪ value is X lines after a name (in your example X is often 1, sometimes 2, but let's just say it can be any number). Let's take a simpler example, that I hope represent your issue.

#Import pdf info into excel how to

If I can convert this text into a list and can do something like this:Īfter I don't know how to save in Excel but I'm sure I can find examples in the forum but first I need to extract only this data. Price always are down two lines "Vencimientos:"Įxpiration date always are down "Vencimientos:" I want extract this information: Customer, Number of bill, Price, expiration date and way to pay.Ĭustomer name always are down of bill always are down "FACTURA" Print convert_pdf_to_txt("FA20150518.pdf")Īnd this is the result: now I have the text in the variable convert_pdf_to_txt. Interpreter = PDFPageInterpreter(rsrcmgr, device)įor page in PDFPage.get_pages(fp, pagenos, maxpages=maxpages, password=password,caching=caching, check_extractable=True): Now I have this code: from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreterįrom nverter import TextConverterĭevice = TextConverter(rsrcmgr, retstr, codec=codec, laparams=laparams)

pdf file to text but I'm not sure how to extract and save the specific information I want. Every month I need extract some data from.