Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
H
hacks-website
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Package registry
Model registry
Operate
Environments
Terraform modules
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
GitLab community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
b3
hacks-website
Commits
bbf23883
Commit
bbf23883
authored
11 years ago
by
Bruno BEAUFILS
Browse files
Options
Downloads
Patches
Plain Diff
Added html2csv.
parent
52550bac
No related branches found
No related tags found
No related merge requests found
Changes
2
Show whitespace changes
Inline
Side-by-side
Showing
2 changed files
README.md
+2
-0
2 additions, 0 deletions
README.md
html2csv
+106
-0
106 additions, 0 deletions
html2csv
with
108 additions
and
0 deletions
README.md
+
2
−
0
View file @
bbf23883
...
...
@@ -14,6 +14,8 @@ dependency possible. They are mainly written in perl.
-
`get-element`
-- print specified HTML elements data (or attribute value)
-
`html2csv`
-- export HTML tables in CSV format
-
`htmltoc`
-- generate table of contents from headings in (x)HTML
-
`htmltree`
-- print HTML tree
...
...
This diff is collapsed.
Click to expand it.
html2csv
0 → 100755
+
106
−
0
View file @
bbf23883
#!/usr/bin/perl
use
strict
;
use
warnings
;
use
Getopt::
Long
;
use
Pod::
Usage
;
use
File::
Temp
qw/tempfile/
;
use
HTML::
TableExtract
;
use
open
qw/:std :utf8/
;
# Ensure UTF-8 support
# La documentation
=pod
=encoding UTF-8
=head1 NAME
html2csv - Export HTML tables into CSV
=head1 SYNOPSIS
=over
=item html2csv [OPTIONS...] [FILE...]
=item html2csv -h
=back
=head1 OPTIONS
=over
=item B<-s> I<STRING>, --separator I<STRING>
Use I<STRING> instead of comma as field separator.
=item B<-n>, --no-protection
Do not quote data in each field.
=item B<-q> I<CHAR>, --quote I<CHAR>
Use I<CHAR> instead of double-quote for data quotation.
=item B<-h>, B<--help>
Print short help message.
=item B<--man>
Print full documentation.
=back
=head1 DESCRIPTION
Print data found in HTML table read from standard input (or specified files)
in CSV (comma-separated values). Each field is double-quoted and separated by
comma.
=cut
# Command line parameters
my
$separator
=
"
,
";
my
$quote
=
'
"
';
my
$protect
=
1
;
if
(
!
GetOptions
('
separator|s=s
'
=>
\
$separator
,
'
quote|q=s
'
=>
\
$quote
,
'
no-protection|n
'
=>
sub
{
$protect
=
0
;
},
'
man
'
=>
sub
{
pod2usage
(
-
verbose
=>
2
,
-
noperldoc
=>
1
);
},
'
help|h
'
=>
sub
{
pod2usage
(
-
verbose
=>
1
,
-
noperldoc
=>
1
);
}))
{
pod2usage
("
Syntax error!
\n
");
}
# Table::Extract object construction
my
$te
=
HTML::
TableExtract
->
new
();
# Parse HTML data from files
local
$/
;
$te
->
parse
(
<>
);
# Process every tables
foreach
my
$ts
(
$te
->
tables
)
{
foreach
my
$row
(
$ts
->
rows
)
{
# Protect cells content
if
(
$protect
)
{
map
{
if
(
$_
)
{
$_
=~
s/$quote/$quote$quote/g
;
$_
=
"
$quote$_$quote
";
}
}
(
@$row
);
}
# I cannot use join because some cell may be undefined (if empty)
foreach
(
@
{
$row
})
{
if
(
$_
)
{
print
"
$_$separator
";
}
else
{
print
"
$separator
";
}
}
print
"
\n
";
}
}
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment