Same way you would extract English, or Russian or Hebrew or any other language. Any modern, properly written, software product for extracting content from a PDF will be able to handle text in any language. PROVIDED that the PDF itself is also well constructed.
All I'm asking for is for you to consider, whether it is possible to have the entire text of your document (assuming you're doing it for an academic, or a governmental document, like the ones I often do for my students) as one huge document, whose only file is one big PDF. Is this technically/legally feasible? EDIT: My next post will be to talk about the technical/legal details of it all.